Reproducibility of in-silico pipelines analysis has become one of biology’s most pressing issues. The exponential growth of biological datasets, increasingly complex data analysis methods and the lack of community standards all present major challenges. These obstacles are exacerbated when considering the installation, deployment and maintenance of bioinformatics pipelines across the diverse range of computational platforms and configurations on which these applications are expected to be applied (workstations, clusters, HPC, clouds, etc.).
Nextflow, a novel pipeline development tool by the CRG (Nature Biotechnology 35, 316–319 (2017) – https://www.nextflow.io/) is emerging as an efficient solution to the reproducibility dilemma in omics analyses. By providing a domain specific language (DSL), Nextflow simplifies the writing of complex distributed computational workflows in a portable and replicable manner. It allows the seamless parallelization and deployment of any existing application with minimal development and maintenance overhead, irrespective of the original programming language. Reproduction and reuse and from any former configuration becomes easily apparent, guaranteeing consistent results over time and across different computing platforms.
The aim of this workshop is to bring together Nextflow developers, workflow experts and bio-informaticians to discuss the current state of Nextflow technology, the latest developments and the open questions to tackle the problem of reproducible -omics analyses. Best practices will be introduced using practical examples on how to handle production of large-scale genomic applications for precision medicine.
A hands-on course and a hackathon, organized on the second day will give the attendees the ability to participate and practically contribute to the development of reference genomic analysis workflows along with Nextflow expert developers. Participants will learn how to write a workflow application with Nextflow, how to handle dependencies with containers (Docker and Singularity), to manage software versions with GitHub and how to deploy the computation across different platforms (HPC cluster, AWS cloud).