Open-access teaching materials for AI/ML, reproducible research, bioinformatics, and more.
New Additions
New additions to the Teaching Materials section.
1 of 5:
Applying Copilot in Rstudio
Tutorial for using AI for all manner of data science, statistics, and bioinformatics tasks by applying Posit's GitHub Copilot plugin for R-studio. This is one of the earliest AI technology integrations for R-studio, released in early 2024. It introduced new settings and co-opts the IDE auto-complete tool to suggest code completions.
Consensus Machine Learning for Gene Target Selection in Pediatric AML Risk
Manuscript investigating gene markers of acute myeloid leukemia (AML) risk in pediatric subjects from the TARGET consortium using a novel consensus framework. The framework used a "consensus", or type of ensemble-based learning approach to identify most likely markers across four distinct and complementary types of machine learning models (neural networks, LASSO, svm, and random forest). The work was conducted as part of the NCBI Hackathons series.
Introductory tutorial/workshop showing fundamental concepts and use of two technologies used for reproducible research and workflows: conda and mamba. Motivates dependency management, shows setup, and how to run the conda and mamba softwares.
Manuscript and software supporting novel retained intron detection in long-read RNA-seq data, and supporting reproducible analysis using 8 short-read RNA-seq retained intron detection tools developed for varying compute environments and OS environments.
Use the new intronomer Python package for retained intron detection from long-read RNA-seq data
1. Artificial Intelligence (AI) and Machine Learning (ML) for computation and data science
Learn about technologies and techniques for using AI and ML in data visualization, analysis, computational, and much more.
AI/ML Teaching Materials
Applying Copilot in Rstudio
Tutorial for using AI for all manner of data science, statistics, and bioinformatics tasks by applying Posit's GitHub Copilot plugin for R-studio. This is one of the earliest AI technology integrations for R-studio, released in early 2024. It introduced new settings and co-opts the IDE auto-complete tool to suggest code completions.
Consensus Machine Learning for Gene Target Selection in Pediatric AML Risk
Manuscript investigating gene markers of acute myeloid leukemia (AML) risk in pediatric subjects from the TARGET consortium using a novel consensus framework. The framework used a "consensus", or type of ensemble-based learning approach to identify most likely markers across four distinct and complementary types of machine learning models (neural networks, LASSO, svm, and random forest). The work was conducted as part of the NCBI Hackathons series.
Dive in depth with reproducible and replicable research practices, including published examples and more.
Reproducible Research Teaching Materials
Introduction to conda and mamba for dependency management
Introductory tutorial/workshop showing fundamental concepts and use of two technologies used for reproducible research and workflows: conda and mamba. Motivates dependency management, shows setup, and how to run the conda and mamba softwares.
Blog post "The case for containers in computation."
In this blog, we explore the following key questions: 1. What are containers and why are they important for users, developers, and researchers? 2. In programming and computational research, what are the benefits of container use and risks of not using them? 3. How does familiarity with containers benefit your career; what projects and fields aleady make extensive use of this important technology?
Learn about the basics of the NextFlow and Snakemake workflow languages, and read about example bioinformatics ETL pipelines.
Bioinformatics Teaching Materials
Intro to NextFlow, 2-part workshop for LIBD Rstats Club
Two-part workshop exploring the fundamentals of workflow syntax with NextFlow. Details different types of workflows, demonstrates cluster use in HPC environments, and shows how to access community workflow projects.
Blog about using NextFlow to run and manage R scripts, with specific use case implementing multiple bulk transcriptomics deconvolution algorithms. The post is accompanied by several GitHub repos aggregating potential algorithms for benchmarking and providing a usable benchmark workflow.
Learn about the landscape of public DNA methylation datasets published to the Gene Expression Omnibus (GEO), how these epigenetic data are used in research, and ways to conduct meta-analysis of DNA methylation microarrays from independent sources.
4a. Publications
Peer-reviewed publications about cross-study analysis, validation of prior findings, and novel analyses of DNA methylation variation in human subjects.
4b. Repositories
Repositories for peer-reviewed publications, meta-analyses, and R/Bioconductor packages for analysis of DNA methylation datasets.
4c. Workshops and vignettes
Workshops and instructive vignettes demonstrating access and use of DNA methylation datasets for research.