Sean K Maden

Teaching Materials

Open-access teaching materials for AI/ML, reproducible research, bioinformatics, and more.

New Additions

New additions to the Teaching Materials section.

1 of 5:

Applying Copilot in Rstudio

Tutorial for using AI for all manner of data science, statistics, and bioinformatics tasks by applying Posit's GitHub Copilot plugin for R-studio. This is one of the earliest AI technology integrations for R-studio, released in early 2024. It introduced new settings and co-opts the IDE auto-complete tool to suggest code completions.

View slides

See the Zenodo reference and cheatsheet

2 of 5:

Consensus Machine Learning for Gene Target Selection in Pediatric AML Risk

Manuscript investigating gene markers of acute myeloid leukemia (AML) risk in pediatric subjects from the TARGET consortium using a novel consensus framework. The framework used a "consensus", or type of ensemble-based learning approach to identify most likely markers across four distinct and complementary types of machine learning models (neural networks, LASSO, svm, and random forest). The work was conducted as part of the NCBI Hackathons series.

Read the bioRxiv preprint

Consult the GitHub repo

3 of 5:

Conda/mamba overview/i>

Introductory tutorial/workshop showing fundamental concepts and use of two technologies used for reproducible research and workflows: conda and mamba. Motivates dependency management, shows setup, and how to run the conda and mamba softwares.

View the Google Slides

4 of 5:

New blog post "The case for containers in computation."

In this blog, we explore the following key questions:

1. What are containers and why are they important for users, developers, and researchers?

2. In programming and computational research, what are the benefits of container use and risks of not using them?

3. How does familiarity with containers benefit your career; what projects and fields aleady make extensive use of this important technology?

Read the blog post.

5 of 5:

Retained introns/intronomer project

Manuscript and software supporting novel retained intron detection in long-read RNA-seq data, and supporting reproducible analysis using 8 short-read RNA-seq retained intron detection tools developed for varying compute environments and OS environments.

View the Genome Biology manuscript

Consult the GitHub repo and conda scripts to reproduce analyses

Use the new intronomer Python package for retained intron detection from long-read RNA-seq data

1. Artificial Intelligence (AI) and Machine Learning (ML) for computation and data science

Learn about technologies and techniques for using AI and ML in data visualization, analysis, computational, and much more.
AI/ML Teaching Materials

Applying Copilot in Rstudio

Tutorial for using AI for all manner of data science, statistics, and bioinformatics tasks by applying Posit's GitHub Copilot plugin for R-studio. This is one of the earliest AI technology integrations for R-studio, released in early 2024. It introduced new settings and co-opts the IDE auto-complete tool to suggest code completions.
View Google Slides. Read the Zenodo reference and cheatsheet.

Consensus Machine Learning for Gene Target Selection in Pediatric AML Risk

Manuscript investigating gene markers of acute myeloid leukemia (AML) risk in pediatric subjects from the TARGET consortium using a novel consensus framework. The framework used a "consensus", or type of ensemble-based learning approach to identify most likely markers across four distinct and complementary types of machine learning models (neural networks, LASSO, svm, and random forest). The work was conducted as part of the NCBI Hackathons series.

Read the bioRxiv preprint

Consult the GitHub repo

2. Reproducible and replicable research practices

Dive in depth with reproducible and replicable research practices, including published examples and more.
Reproducible Research Teaching Materials

Introduction to conda and mamba for dependency management

Introductory tutorial/workshop showing fundamental concepts and use of two technologies used for reproducible research and workflows: conda and mamba. Motivates dependency management, shows setup, and how to run the conda and mamba softwares.
View Google Slides. Install conda and mamba.

Blog post "The case for containers in computation."

In this blog, we explore the following key questions: 1. What are containers and why are they important for users, developers, and researchers? 2. In programming and computational research, what are the benefits of container use and risks of not using them? 3. How does familiarity with containers benefit your career; what projects and fields aleady make extensive use of this important technology?

Read the blog post

Retained introns/intronomer project

Reproducible analysis of novel retained intron detection in long- and short-read RNA-seq data.

View the Genome Biology manuscript. Consult the GitHub repo and conda scripts to reproduce analyses. Use the new intronomer Python package for retained intron detection from long-read RNA-seq data

3. Bioinformatics pipelines and workflows

Learn about the basics of the NextFlow and Snakemake workflow languages, and read about example bioinformatics ETL pipelines.
Bioinformatics Teaching Materials

Intro to NextFlow, 2-part workshop for LIBD Rstats Club

Two-part workshop exploring the fundamentals of workflow syntax with NextFlow. Details different types of workflows, demonstrates cluster use in HPC environments, and shows how to access community workflow projects.

View part 1 Google Slides and recording. View part 2 Google Slides and recording. Consult the workshop GitHub repo.

Using Nextflow and R

Tutorial showing how to use NextFlow to run R code and scripts. Details concepts/background, setup, and use cases for running scripts and benchmarks.
View Google Slides

Using Nextflow and R

Blog about using NextFlow to run and manage R scripts, with specific use case implementing multiple bulk transcriptomics deconvolution algorithms. The post is accompanied by several GitHub repos aggregating potential algorithms for benchmarking and providing a usable benchmark workflow.
Read the blog post

4. Meta-analyses and DNA methylation

Learn about the landscape of public DNA methylation datasets published to the Gene Expression Omnibus (GEO), how these epigenetic data are used in research, and ways to conduct meta-analysis of DNA methylation microarrays from independent sources.
4a. Publications
Peer-reviewed publications about cross-study analysis, validation of prior findings, and novel analyses of DNA methylation variation in human subjects.
4b. Repositories
Repositories for peer-reviewed publications, meta-analyses, and R/Bioconductor packages for analysis of DNA methylation datasets.
4c. Workshops and vignettes
Workshops and instructive vignettes demonstrating access and use of DNA methylation datasets for research.

Email GitHub LinkedIn X/Twitter BlueSky ORCID

Generated by Copilot AI, version 2025, Microsoft.

© 2025 Sean Maden. All rights reserved.