The Condition-Tree

A key concept behind MONSDA is that config files are split according to conditions defined in the config file and each subworkflow is then run consecutively. This separates workflows in independent subworkflows, each potentially running on different input, with differing options and executables, without danger of interfering with each other.

As described in Planning a project run, you should have an idea of what to analyze and how to split up your conditions if needed. For each ID you work on, you can define no, one or multiple conditions and settings that will be used for the analysis. The condition-tree also defines the directory structure to follow for Input and Output directories. We assume a general tree to look something like

'ID' -> 'CONDITION' -> 'SETTING'

Here ID is the first level and optional Condition the second. Setting is used by MONSDA to enable processing of the same samples under different settings or commandline options for e.g. mapping tools, trimming tools and later also postprocessing tools. MONSDA will also build an output directory based on the combination of tools used, e.g. fastqc-cutadapt-star-umitools, to indicate which combination of tools was used to generate the output and to prevent results from being mixed or overwritten.

As an example, I want to analyze samples retrieved from LabA on 01012020 (yes that happens), with the mapping tools star and hisat, my condition-tree would look like this LabA:01012020 and my FASTQ input directory would resemble that like FASTQ/LabA/01012020. The ‘01012020’ directory would thereby contain all the fastq.gz files I need for analysis as stated in the corresponding config file. As we assume that settings may change but the input files will stay the same, MONSDA will search one level upwards of the deepest level in the condition-tree. This means, if you have a tree like:

'ID1' -> 'CONDITION1' -> 'SETTING1', 'SETTING2', 'SETTING3'

You do not need to copy input from FASTQ/LabA/01012020 to FASTQ/LabA/01012020/SETTING1/2/3, instead MONSDA will find the input in FASTQ/LabA/01012020 and generate output directories which contain the Setting level. This works of course also if you want to analyze samples from different dates and same lab with same settings or different labs and so on.

MONSDA will automatically define a unique tool-key based on currently enabled workflow steps and the combination of tools defined in the config file. From that information it will generate output folders like MAPPING/LabA/01012020/star and MAPPING/LabA/01012020/hisat if no other tools and workflow steps where configured to be used.

Optionally a user can also run one or the other tool with different settings for example to benchmark tools. Define for example map_stringent and map_relaxed and indicate this on the MAPPING level in the config file. FASTQ input will still be found in FASTQ/LabA/01012020 while output files will appear in MAPPING/LabA/01012020/map_stringent/star and MAPPING/LabA/01012020/map_stringent/hisat or MAPPING/LabA/01012020/map_relaxed/star and MAPPING/LabA/01012020/map_relaxed/hisat respectively.