Sean K. Maden - Blog

Better benchmark workflows, and why you should use R with NextFlow

R

nextflow

benchmarks

workflows

As workflow technologies continue to be updated and improved, their learning materials become more robust, and their support communities grow, there will be fewer barriers to using them to streamline day-to-day development routines, especially when dealing with complex parallel tasks. Yet relatively few learning resources cover the management of standard benchmarking tasks using workflows. Even fewer provide solutions for specific domains like bioinformatics and the R programming language. In this post, I attempted to address this issue with several solutions arrived at after considerable brainstorming, research, trial-and-error, and conversing with my stellar computational bioscience colleagues. I ultimately found that not only can R be used with NextFlow for benchmarking, but there are many domains where this probably should be the standard approach.

Run R package checks with a shell script

R

bash

developers

While checks are crucial to R package development, running them from command line can quickly become repetitive. I’ve written a shell script, rpackagecheck.sh, that runs the standard steps to checking an R package. The script uses R CMD ... to install, build, and check packages with any combination of the three major check types. This script can help discourage accidents, such as running check on a directory rather than a .tar.gz file, and ultimately expedite your development workflow.

Cracking the Monty Hall problem with brute force simulation

simulation

R

ggplot2

blog

On a game show stage before you wait 3 closed doors, behind which have been deposited 2 goats and 1 prize, respectively. You are called on to pick a door to be opened to reveal either a goat or a prize. The host, Monty Hall, then reveals a goat behind one of the two remaining unpicked doors. You are then given the option to switch your door selection to the final unpicked door before the big reveal. What should you do?

My 2018 proposal for the Better Scientific Software Fellowship

science

software

reproducibility

I wanted to share my proposal for the 2018 Better Scientific Software (BSSw) Fellowship. BSSw aims to increase and preserve integrity and standards for publishing computer code in science, and their fellowship program recognizes and supports advocates of this cause. You may or may not be aware that we currently lack standard ways of referencing published code in science as independently citable units. Furthermore, vital source code for experiments can be distributed in many places, including supplemental materials sections behind paywalls, personal websites that may become inaccessible or go offline over time, and repositories on GitHub or elsewhere that may not include inherent and persistent identifiers. I propose using an autocompilation technology to aggregate published scientific computer code and code metadata into a new database, called Pubsrc. This will enable novel assessments of scientific code use, including automatic generation of dependency usage networks, tracking the impact of newly discovered software bugs throughout research, and making scientific code independently citable. I hope you enjoy reading my proposal, and please share or tweet this post if you support this cause.