Sean K Maden



Teaching Materials

Open-access teaching materials for AI/ML, reproducible research, bioinformatics, and more.


New Additions

New additions to the Teaching Materials section.


1 of 5:

stuff

Applying Copilot in Rstudio

Tutorial for using AI for all manner of data science, statistics, and bioinformatics tasks by applying Posit's GitHub Copilot plugin for R-studio. This is one of the earliest AI technology integrations for R-studio, released in early 2024. It introduced new settings and co-opts the IDE auto-complete tool to suggest code completions.

View slides

See the Zenodo reference and cheatsheet

2 of 5:

stuff

Consensus Machine Learning for Gene Target Selection in Pediatric AML Risk

Manuscript investigating gene markers of acute myeloid leukemia (AML) risk in pediatric subjects from the TARGET consortium using a novel consensus framework. The framework used a "consensus", or type of ensemble-based learning approach to identify most likely markers across four distinct and complementary types of machine learning models (neural networks, LASSO, svm, and random forest). The work was conducted as part of the NCBI Hackathons series.

Read the bioRxiv preprint

Consult the GitHub repo

3 of 5:

stuff

Conda/mamba overview/i>

Introductory tutorial/workshop showing fundamental concepts and use of two technologies used for reproducible research and workflows: conda and mamba. Motivates dependency management, shows setup, and how to run the conda and mamba softwares.

View the Google Slides

4 of 5:

stuff

New blog post "The case for containers in computation."

In this blog, we explore the following key questions:

  • 1. What are containers and why are they important for users, developers, and researchers?

  • 2. In programming and computational research, what are the benefits of container use and risks of not using them?

  • 3. How does familiarity with containers benefit your career; what projects and fields aleady make extensive use of this important technology?

Read the blog post.

5 of 5:

stuff

Retained introns/intronomer project

Manuscript and software supporting novel retained intron detection in long-read RNA-seq data, and supporting reproducible analysis using 8 short-read RNA-seq retained intron detection tools developed for varying compute environments and OS environments.

View the Genome Biology manuscript

Consult the GitHub repo and conda scripts to reproduce analyses

Use the new intronomer Python package for retained intron detection from long-read RNA-seq data

1. Artificial Intelligence (AI) and Machine Learning (ML) for computation and data science

Learn about technologies and techniques for using AI and ML in data visualization, analysis, computational, and much more.

AI/ML Teaching Materials

  • stuff

    Applying Copilot in Rstudio

    Tutorial for using AI for all manner of data science, statistics, and bioinformatics tasks by applying Posit's GitHub Copilot plugin for R-studio. This is one of the earliest AI technology integrations for R-studio, released in early 2024. It introduced new settings and co-opts the IDE auto-complete tool to suggest code completions.

    View Google Slides. Read the Zenodo reference and cheatsheet.
  • stuff

    Consensus Machine Learning for Gene Target Selection in Pediatric AML Risk

    Manuscript investigating gene markers of acute myeloid leukemia (AML) risk in pediatric subjects from the TARGET consortium using a novel consensus framework. The framework used a "consensus", or type of ensemble-based learning approach to identify most likely markers across four distinct and complementary types of machine learning models (neural networks, LASSO, svm, and random forest). The work was conducted as part of the NCBI Hackathons series.

    Read the bioRxiv preprint

    Consult the GitHub repo

2. Reproducible and replicable research practices

Dive in depth with reproducible and replicable research practices, including published examples and more.

Reproducible Research Teaching Materials

  • stuff

    Introduction to conda and mamba for dependency management

    Introductory tutorial/workshop showing fundamental concepts and use of two technologies used for reproducible research and workflows: conda and mamba. Motivates dependency management, shows setup, and how to run the conda and mamba softwares.

    View Google Slides. Install conda and mamba.
  • stuff

    Blog post "The case for containers in computation."

    In this blog, we explore the following key questions: 1. What are containers and why are they important for users, developers, and researchers? 2. In programming and computational research, what are the benefits of container use and risks of not using them? 3. How does familiarity with containers benefit your career; what projects and fields aleady make extensive use of this important technology?

    Read the blog post

  • stuff

    Retained introns/intronomer project

    Reproducible analysis of novel retained intron detection in long- and short-read RNA-seq data.

    View the Genome Biology manuscript. Consult the GitHub repo and conda scripts to reproduce analyses. Use the new intronomer Python package for retained intron detection from long-read RNA-seq data

3. Bioinformatics pipelines and workflows

Learn about the basics of the NextFlow and Snakemake workflow languages, and read about example bioinformatics ETL pipelines.

Bioinformatics Teaching Materials

  • stuff

    Intro to NextFlow, 2-part workshop for LIBD Rstats Club

    Two-part workshop exploring the fundamentals of workflow syntax with NextFlow. Details different types of workflows, demonstrates cluster use in HPC environments, and shows how to access community workflow projects.

    View part 1 Google Slides and recording. View part 2 Google Slides and recording. Consult the workshop GitHub repo.

  • stuff

    Using Nextflow and R

    Tutorial showing how to use NextFlow to run R code and scripts. Details concepts/background, setup, and use cases for running scripts and benchmarks.

    View Google Slides
  • stuff

    Using Nextflow and R

    Blog about using NextFlow to run and manage R scripts, with specific use case implementing multiple bulk transcriptomics deconvolution algorithms. The post is accompanied by several GitHub repos aggregating potential algorithms for benchmarking and providing a usable benchmark workflow.

    Read the blog post

4. Meta-analyses and DNA methylation

Learn about the landscape of public DNA methylation datasets published to the Gene Expression Omnibus (GEO), how these epigenetic data are used in research, and ways to conduct meta-analysis of DNA methylation microarrays from independent sources.

4a. Publications

Peer-reviewed publications about cross-study analysis, validation of prior findings, and novel analyses of DNA methylation variation in human subjects.

4b. Repositories

Repositories for peer-reviewed publications, meta-analyses, and R/Bioconductor packages for analysis of DNA methylation datasets.

4c. Workshops and vignettes

Workshops and instructive vignettes demonstrating access and use of DNA methylation datasets for research.