.. _WFoverview: Available Workflows =================== FETCH -> Download from SRA BASECALL -> Call bases from FAST5 or POD5 QC -> Quality Control of FASTQ/BAM files TRIMMING -> Trim adaptor remnants MAPPING -> Map reads to reference DEDUP -> Deduplicate reads COUNTING -> Count or quantify reads TRACKS -> Generates trackdb.txt and BIGWIG files PEAKS -> Call peaks for ChIP, RIP, CLIP and other DE -> Differential Expression Analysis DEU -> Differential Exon Usage Analysis DAS -> Differential Alternative Splicing Analysis DTU -> Differential Transcript Usage Analysis CIRCS -> Circular RNA identification PREPROCESSING ============= FETCH ##### Downloads files from SRA .. table:: :widths: 10, 40, 10, 10, 10, 10, 10 :class: tight-table +------------------------+------------------------------------------------------------------------------------+------+---------------+---------------------------------------------+----------------+---------+ | TOOL | DESCRIPTION | ENV | BIN | LINK | INPUT | OUTPUT | +========================+====================================================================================+======+===============+=============================================+================+=========+ | SRAtools fasterq-dump | The fasterq-dump tool extracts data in FASTQ- or FASTA-format from SRA-accessions | sra | fasterq-dump | `sra `_ | SRA accession | FASTQ | +------------------------+------------------------------------------------------------------------------------+------+---------------+---------------------------------------------+----------------+---------+ BASECALL ######## Creates FASTQ files from ONT FAST5, Guppy needs to be installed locally .. table:: :widths: 10, 40, 10, 10, 10, 10, 10 :class: tight-table +-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+---------------------+------------------------------------------------------------------------+--------+---------+ | TOOL | DESCRIPTION | ENV | BIN | LINK | INPUT | OUTPUT | +=======+============================================================================================================================================================================================================================================================================+========+=====================+========================================================================+========+=========+ | Guppy | Data processing toolkit that contains Oxford Nanopore’s basecalling algorithms, and several bioinformatic post-processing features, such as barcoding/demultiplexing, adapter trimming, and alignment. Needs to be installed locally as no **conda** version is available | guppy | $PATH_TO_LOCAL_BIN | `guppy `_ | FAST5 | FASTQ | +-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+---------------------+------------------------------------------------------------------------+--------+---------+ | Dorado | Dorado is a high-performance, easy-to-use, open source basecaller for Oxford Nanopore reads. Needs to be installed locally as no **conda** version is available | dorado | $PATH_TO_LOCAL_BIN | `dorado `_ | POD5 | FASTQ | +-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+---------------------+------------------------------------------------------------------------+--------+---------+ QUALITY CONTROL I ################## This workflow step can be run as preprocessing step if none of the processing workflows is defined in the config.json. .. table:: :widths: 10, 40, 10, 10, 10, 10, 10 :class: tight-table +----------------------------+------------------------------------------------------------+---------+---------+-------------------------------------------------------------------------+------------+-----------+ | TOOL | DESCRIPTION | ENV | BIN | LINK | INPUT | OUTPUT | +============================+============================================================+=========+=========+=========================================================================+============+===========+ | FASTQC (includes MULTIQC) | A quality control tool for high throughput sequence data. | fastqc | fastqc | `fastqc `_ | FASTQ/BAM | ZIP/HTML | +----------------------------+------------------------------------------------------------+---------+---------+-------------------------------------------------------------------------+------------+-----------+ | RustQC (includes MULTIQC) | High-performance RNA-seq QC suite with MultiQC-compatible outputs. | rustqc | rustqc | `rustqc `_ | BAM | TEXT/TSV/LOG | +----------------------------+------------------------------------------------------------+---------+---------+-------------------------------------------------------------------------+------------+-----------+ PROCESSING ========== QUALITY CONTROL II ################### If any of the below listed processing steps is defined in the config.json, quality control will run for all generated output files if enabled. .. table:: :widths: 10, 40, 10, 10, 10, 10, 10 :class: tight-table +----------------------------+------------------------------------------------------------+---------+---------+-------------------------------------------------------------------------+------------+-----------+ | TOOL | DESCRIPTION | ENV | BIN | LINK | INPUT | OUTPUT | +============================+============================================================+=========+=========+=========================================================================+============+===========+ | FASTQC (includes MULTIQC) | A quality control tool for high throughput sequence data. | fastqc | fastqc | `fastqc `_ | FASTQ/BAM | ZIP/HTML | +----------------------------+------------------------------------------------------------+---------+---------+-------------------------------------------------------------------------+------------+-----------+ | RustQC (includes MULTIQC) | High-performance RNA-seq QC suite with MultiQC-compatible outputs. | rustqc | rustqc | `rustqc `_ | BAM | TEXT/TSV/LOG | +----------------------------+------------------------------------------------------------+---------+---------+-------------------------------------------------------------------------+------------+-----------+ Trimming ######## Trims adaptor left-overs from fastq files .. table:: :widths: 10, 40, 10, 10, 10, 10, 10 :class: tight-table +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+--------------+----------------------------------------------------------------------------------+--------+----------------+ | TOOL | DESCRIPTION | ENV | BIN | LINK | INPUT | OUTPUT | +==============+==========================================================================================================================================================================================================================+=============+==============+==================================================================================+========+================+ | trim_galore | A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries. | trimgalore | trim_galore | `trimgalore `_ | FASTQ | TRIMMED_FASTQ | +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+--------------+----------------------------------------------------------------------------------+--------+----------------+ | cutadapt | Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads. | cutadapt | cutadapt | `cutadapt `_ | FASTQ | TRIMMED_FASTQ | +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+--------------+----------------------------------------------------------------------------------+--------+----------------+ | bbduk | “Duk” stands for Decontamination Using Kmers. BBDuk was developed to combine most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. | bbduk | bbduk | `bbduk `_ | FASTQ | TRIMMED_FASTQ | +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+--------------+----------------------------------------------------------------------------------+--------+----------------+ | fastp | A tool designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance. | fastp | fastp | `fastp `_ | FASTQ | TRIMMED_FASTQ | +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+--------------+----------------------------------------------------------------------------------+--------+----------------+ Mapping ####### Maps sequences to reference genomes or transcriptomes .. table:: :widths: 10, 40, 10, 10, 10, 10, 10 :class: tight-table +--------------+----------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+---------------------------------------------------------------------+----------------------+-------------+ | TOOL | DESCRIPTION | ENV | BIN | LINK | INPUT | OUTPUT | +==============+============================================================================================================================+======================+=============+=====================================================================+======================+=============+ | HISAT2 | HISAT2 is a fast and sensitive alignment program | hisat2 | hisat2 | `hisat2 `_ | FASTQ/TRIMMED_FASTQ | SAM.gz/BAM | +--------------+----------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+---------------------------------------------------------------------+----------------------+-------------+ | STAR | Spliced Transcripts Alignment to a Reference | star | STAR | `star `_ | FASTQ/TRIMMED_FASTQ | SAM.gz/BAM | +--------------+----------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+---------------------------------------------------------------------+----------------------+-------------+ | STARsolo | STARsolo: mapping, demultiplexing and quantification for single cell RNA-seq | star | STAR | `STARsolo `_ | FASTQ/TRIMMED_FASTQ | SAM.gz/BAM | +--------------+----------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+---------------------------------------------------------------------+----------------------+-------------+ | Segemehl2|3 | Segemehl is a software to map short sequencer reads to reference genomes. | segemehl2/segemehl3 | segemehl.x | `segemehl `_ | FASTQ/TRIMMED_FASTQ | SAM.gz/BAM | +--------------+----------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+---------------------------------------------------------------------+----------------------+-------------+ | Segemehl2|3 bisulfite | Segemehl is a software to map short sequencer reads to reference genomes. This is the bisulfite mapping mode | segemehl2bisulfite/segemehl3bisulfite | segemehl.x | `segemehl `_ | FASTQ/TRIMMED_FASTQ | SAM.gz/BAM | +--------------+----------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+---------------------------------------------------------------------+----------------------+-------------+ | BWA | BWA is a software package for mapping low-divergent sequences against a large reference genome | bwa | bwa mem | `bwa `_ | FASTQ/TRIMMED_FASTQ | SAM.gz/BAM | +--------------+----------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+---------------------------------------------------------------------+----------------------+-------------+ | BWA2 | BWA is a software package for mapping low-divergent sequences against a large reference genome | bwa2 | bwa2-mem | `bwa2 `_ | FASTQ/TRIMMED_FASTQ | SAM.gz/BAM | +--------------+----------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+---------------------------------------------------------------------+----------------------+-------------+ | BWA-Meth | BWA-meth, Fast and accurante alignment of BS-Seq reads. | bwameth | bwameth.py | `bwa-meth `_ | FASTQ/TRIMMED_FASTQ | SAM.gz/BAM | +--------------+----------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+---------------------------------------------------------------------+----------------------+-------------+ | Minimap2 | Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. | minimap | minimap2 | `minimap `_ | FASTQ/TRIMMED_FASTQ | SAM.gz/BAM | +--------------+----------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+---------------------------------------------------------------------+----------------------+-------------+ DEDUP ##### Deduplicate reads by UMI or based on mapping position and CIGAR string .. table:: :widths: 10, 40, 10, 10, 10, 10, 10 :class: tight-table +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------+----------------------------------------------------------------------------------------------------+----------------------+------------+ | TOOL | DESCRIPTION | ENV | BIN | LINK | INPUT | OUTPUT | +===============+====================================================================================================================================================+===========+============+====================================================================================================+======================+============+ | UMI-tools | UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes. | umitools | umi_tools | `umitools `_ | FASTQ/TRIMMED_FASTQ | FASTQ/BAM | +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------+----------------------------------------------------------------------------------------------------+----------------------+------------+ | fgumi | High-performance tools for UMI-tagged sequencing data including UMI extraction and UMI-aware deduplication. | fgumi | fgumi | `fgumi `_ | FASTQ/TRIMMED_FASTQ | FASTQ/BAM | +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------+----------------------------------------------------------------------------------------------------+----------------------+------------+ | Picard tools | A better duplication marking algorithm that handles all cases including clipped and gapped alignments. | picard | picard | `picard `_ | BAM | BAM | +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------+----------------------------------------------------------------------------------------------------+----------------------+------------+ POSTPROCESSING ============== Read-Counting and Quantification ################################ Count (unique) mapped reads and how often they map to defined features .. table:: :widths: 10, 40, 10, 10, 10, 10, 10 :class: tight-table +----------------+-----------------------------------------------------------------------------------------------------------------------+-------------+----------------+------------------------------------------------------------------+----------------------+---------+ | TOOL | DESCRIPTION | ENV | BIN | LINK | INPUT | OUTPUT | +================+=======================================================================================================================+=============+================+==================================================================+======================+=========+ | FeatureCounts | A software program developed for counting reads to genomic features such as genes, exons, promoters and genomic bins | countreads | featureCounts | `featurecounts `_ | BAM/FASTQ | TEXT | +----------------+-----------------------------------------------------------------------------------------------------------------------+-------------+----------------+------------------------------------------------------------------+----------------------+---------+ | Salmon | Salmon is a tool for wicked-fast transcript quantification from RNA-seq data. | salmon | salmon | `salmon `_ | FASTQ/TRIMMED_FASTQ | TEXT | +----------------+-----------------------------------------------------------------------------------------------------------------------+-------------+----------------+------------------------------------------------------------------+----------------------+---------+ | Kallisto | Kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. | kallisto | kallisto | `kallisto `_ | FASTQ/TRIMMED_FASTQ | TEXT | +----------------+-----------------------------------------------------------------------------------------------------------------------+-------------+----------------+------------------------------------------------------------------+----------------------+---------+ Differential Analyses ##################### Includes DE, DEU, DAS and DTU .. table:: :class: tight-table +-----------+-------------------------------------+------------------+-----------------+----------------+---------------------------------+----------------+------------------------------------------------------+-----------------------------------------+-----------------------------------------+-------------------+-------------------------------------------------------------------+-------+ | Tool | Analysis | Filtering | Normalization | Distribution | Testing | Significance | Results Table | further | SigTables | Clustering | further | Rmd | +===========+=====================================+==================+=================+================+=================================+================+======================================================+=========================================+=========================================+===================+===================================================================+=======+ | edgeR | Differential Gene Expression | filterByExpr() | TMM | NB | Fisher’s exact test | pValue, LFC | results, sorted-results | normalized | Sig, SigUP, SigDOWN | MDS-plot | BCV, QLDisp, MD(per comparison) | ✓ | +-----------+-------------------------------------+------------------+-----------------+----------------+---------------------------------+----------------+------------------------------------------------------+-----------------------------------------+-----------------------------------------+-------------------+-------------------------------------------------------------------+-------+ | edgeR | Differential Exon Usage | filterByExpr() | TMM | NB | Fisher’s exact test | pValue, LFC | results | normalized | | MDS-plot | BCV, QLDisp, MD(per comparison) | ✓ | +-----------+-------------------------------------+------------------+-----------------+----------------+---------------------------------+----------------+------------------------------------------------------+-----------------------------------------+-----------------------------------------+-------------------+-------------------------------------------------------------------+-------+ | edgeR | Differential Alternative Splicing | filterByExpr() | TMM | NB | Simes, gene-level, exon-level | pValue, LFC | results(diffSpliceExonTest, Simes-Test, Gene-Test) | | Sig, SigUP, SigDOWN | MDS-plot | BCV, QLDisp, MD(per comparison), topSpliceSimes-plots(per Gene) | ✓ | +-----------+-------------------------------------+------------------+-----------------+----------------+---------------------------------+----------------+------------------------------------------------------+-----------------------------------------+-----------------------------------------+-------------------+-------------------------------------------------------------------+-------+ | DESeq2 | Differential Gene Expression | RowSums >= 10 | RLE | NB | Wald test | pValue, LFC | results | rld, vsd, results(per comparison) | Sig, SigUP, SigDOWN | PCA | Heatmaps, MA(per comparison), VST-and-log2 | ✓ | +-----------+-------------------------------------+------------------+-----------------+----------------+---------------------------------+----------------+------------------------------------------------------+-----------------------------------------+-----------------------------------------+-------------------+-------------------------------------------------------------------+-------+ | DEXSeq | Differential Exon Usage | RowSums >= 10 | RLE | Cox-Reid | likelihood ratio test | | | | | | | | +-----------+-------------------------------------+------------------+-----------------+----------------+---------------------------------+----------------+------------------------------------------------------+-----------------------------------------+-----------------------------------------+-------------------+-------------------------------------------------------------------+-------+ | DEXSeq | Differential Transcript Usage | dmFilter() | RLE | Cox-Reid | likelihood ratio test | pValue | results | | | | | ✓ | +-----------+-------------------------------------+------------------+-----------------+----------------+---------------------------------+----------------+------------------------------------------------------+-----------------------------------------+-----------------------------------------+-------------------+-------------------------------------------------------------------+-------+ | DIEGO | Differential Alternative Splicing | | | | Mann-Whitney U test | pValue | results | | Sig | Dendrogram-plot | | ✓ | +-----------+-------------------------------------+------------------+-----------------+----------------+---------------------------------+----------------+------------------------------------------------------+-----------------------------------------+-----------------------------------------+-------------------+-------------------------------------------------------------------+-------+ | DRIMSeq | Differential Transcript Usage | dmFilter() | | DM | | pValue, LFC | results(transcript, genes) | Proportions-table, genewise precision | Sig, SigUP, SigDOWN (transcipt, gene) | | FeatPerGene, precision, Pvalues (per comparison) | ✓ | +-----------+-------------------------------------+------------------+-----------------+----------------+---------------------------------+----------------+------------------------------------------------------+-----------------------------------------+-----------------------------------------+-------------------+-------------------------------------------------------------------+-------+ TRACKS ############### This workflow generates trackdb.txt files and bigwig files which can be used to create UCSC track hubs. However, bigwigs can of course be used for other genome viewers as well. .. table:: :widths: 10, 40, 10, 10, 10, 10, 10 :class: tight-table +-------+----------------------------------------------------------------------------------------------------------+------+------+------------------------------------------------------------------------------+--------+--------------+ | TOOL | DESCRIPTION | ENV | BIN | LINK | INPUT | OUTPUT | +=======+==========================================================================================================+======+======+==============================================================================+========+==============+ | UCSC | Track hubs are web-accessible directories of genomic data that can be viewed on the UCSC Genome Browser | ucsc | ucsc | `ucsc `_ | BAM | BIGWIG/HUBS | +-------+----------------------------------------------------------------------------------------------------------+------+------+------------------------------------------------------------------------------+--------+--------------+ PEAKS ##### Calls peaks from mapping data for ChIP, RIP, CLIP and other .. table:: :widths: 10, 40, 10, 10, 10, 10, 10 :class: tight-table +----------+----------------------------------------------------------------------------------------------------------------------+----------+----------+--------------------------------------------------------------------+--------+------------------+ | TOOL | DESCRIPTION | ENV | BIN | LINK | INPUT | OUTPUT | +==========+======================================================================================================================+==========+==========+====================================================================+========+==================+ | Piranha | Piranha is a peak-caller for CLIP- and RIP-Seq data. | piranha | piranha | `piranha `_ | BAM | BED/BEDG/BIGWIG | +----------+----------------------------------------------------------------------------------------------------------------------+----------+----------+--------------------------------------------------------------------+--------+------------------+ | MACS | Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites. | macs | macs | `macs `_ | BAM | BED/BEDG/BIGWIG | +----------+----------------------------------------------------------------------------------------------------------------------+----------+----------+--------------------------------------------------------------------+--------+------------------+ | SciPhy | Software for cyPhyRNA-Seq Data analysis | scyphy | piranha | `cyphyRNA-Seq `_ | BAM | BED/BEDG/BIGWIG | +----------+----------------------------------------------------------------------------------------------------------------------+----------+----------+--------------------------------------------------------------------+--------+------------------+ | Peaks | Slinding window peak finding tool for quick assessment of peaks. UNPUBLISHED, recommended for initial scanning only | peaks | peaks | `ttp `_ | BAM | BED/BEDG/BIGWIG | +----------+----------------------------------------------------------------------------------------------------------------------+----------+----------+--------------------------------------------------------------------+--------+------------------+ CIRCS ############### Find circular RNAs in mapping data, CIRI2 needs to be installed locally. .. table:: :widths: 10, 40, 10, 10, 10, 10, 10 :class: tight-table +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+--------------------+-----------------------------------------------------------------------+--------+------------------+ | TOOL | DESCRIPTION | ENV | BIN | LINK | INPUT | OUTPUT | +=======+=========================================================================================================================================================================================================+========+====================+=======================================================================+========+==================+ | CIRI2 | CIRI (circRNA identifier) is a novel chiastic clipping signal based algorithm,which can unbiasedly and accurately detect circRNAs from transcriptome data by employing multiple filtration strategies. | ciri2 | $Path_to_CIRI2.pl | `ciri2 `_ | BAM | BED/BEDG/BIGWIG | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+--------------------+-----------------------------------------------------------------------+--------+------------------+