First Steps

MONSDA acts a wrapper around Snakemake or Nextflow based on a user defined config.json file. This config.json holds all the information that is needed to run the jobs and will be parsed by MONSDA and split into independent sub-configs that can later be found in the directory SubSnakes or SubFlows respectively. Command line execution calls are stored in the directory JOBS, so users can manipulate those or rerun them manually as needed. By default, however, MONSDA will run those jobs automatically either locally, or through Snakemake’s or Nextflow’s integrated cluster interfaces.

To successfully run an analysis pipeline, a few steps have to be followed:

Install MONSDA either via bioconda or pip following the instruction in Installation
Directory structure: The structure for the directories is dictated by The Condition-Tree in the config file
Config file: This is the central part of a MONSDA run. Depending on The config-file MONSDA will determine processing steps and generate corresponding config and workflow files to run each subworkflow until all processing steps are done.

In general it is necessary to write a configuration file containing information on paths, files to process and settings beyond default for mapping tools and others. The template on which analysis is based on can be found in the config directory and will be explained in detail later.

To create a working environment for this repository please install the MONSDA.yaml environment (if not installed via bioconda) as found in the envs directory like so:

conda env create -n monsda -f envs/MONSDA.yaml

The envs directory holds all the environments needed to run the pipelines in the workflows directory, these will be installed automatically when needed.

For fast resolve of conda packages, we recommend conda-libmamba-solver which is a new solver for the conda package manager and speeds up conda without the need to install mamba and is shipped with MONSDA. However, the user if free to use mamba which is currently also the standard conda-frontend for Snakemake.

For distribution of jobs one can either rely on local hardware, use scheduling software like Slurm or the SGE or follow any other integration in Snakemake or Nextflow but be aware that most of these have not been tested for this repository and usually require additional system dependent setup and configuration.

This manual will only show examples on local and SLURM usage.