Workflow Type: Common Workflow Language

A workflow for marine Genomic Observatories data analysis

An EOSC-Life project

Build Status

The workflows developed in the framework of this project are based on pipeline-v5 of the MGnify resource.

This branch is a child of the pipeline_5.1 branch that contains all CWL descriptions of the MGnify pipeline version 5.1.

The following comes from the initial repo and describes how to get the databases required.


This repository contains all CWL descriptions of the MGnify pipeline version 5.0.


For a thorough read-the-docs, click here.

We kindly recommend use the MGnify resource for data processing.

If you want to run pipeline locally, we recommend you use our pre-build docker containers.

Requirements to run pipeline

  • python3 [v 3.6+]

  • docker [v 19.+] or singularity

  • cwltool [v 3.+] or toil [v 4.2+]

  • hdd for databases ~133G


All the tools are containerized.

Unfortunately, antiSMASH and InterProScan containers are very big. We provide two options:

  1. Pre-install these tools. The instructions on how to setup the environment are here.

  2. Use containers. First of all you need to uncomment hints in InterProScan-v5.cwl and antismash_v4.cwl. Pre-pull containers from

docker pull microbiomeinformatics/pipeline-v5.interproscan:v5.36-75.0
docker pull microbiomeinformatics/pipeline-v5.antismash:v4.2.0


git clone 
cd pipeline-v5

Download necessary dbs

We have 3 pipelines (amplicon, assembly and wgs) in one repository. You can download dbs for single or multiple analysis types.

Script has 3 arguments: -m (amplicon), -a (assembly), -w (raw reads / WGS).

To download only amplicon databases do -m True -a False -w False.

mkdir ref-dbs && cd ref-dbs
bash ../Installation/ -a True -m True -w True  # for all types
cd ..

Create yml-file

Set DIRECTORY as path to the same directory where you downloaded all databases (ref-dbs).

TYPE: assembly/wgs/amplicon

python3 Installation/ --dir  --type  
# example: python3 Installation/ --dir ref-dbs --type assembly

If you need to generate several YML-files, run this script several times with different TYPEs.


Before running the pipeline, you need to add lines to the YML files detailing the sequence type and path to FASTA/FASTQ file(-s).

Amplicon and Raw reads analysis can be performed on single-end or paired-end FASTQ file(-s).

Assembly pipeline requires a contig FASTA file.

  • If you are running amplicon or raw-reads single analysis - you need to add to generated YML-file:
  format: edam:format_1930
  class: File
  • If you are running amplicon or raw-reads paired analysis - you need to add to generated YML-file:
  format: edam:format_1930
  class: File
  format: edam:format_1930
  class: File
  • If you are running assembly analysis - you need to add to generated YML-file:
  format: edam:format_1929
  class: File


export ANALYSIS=[amplicon/assembly/raw-reads]

cwltool --enable-dev workflows/${ANALYSIS}-wf--v.5-cond.cwl ${ANALYSIS}.yml


export ANALYSIS=[amplicon/assembly/raw-reads]

toil-cwl-runner \
  --preserve-entire-environment \
  --enable-dev \
  --disableChaining \
  --logFile toil.log \
  --jobStore work-directory \
  --outdir results-folder \
  ${ANALYSIS}-wf--v.5-cond.cwl ${ANALYSIS}.yml

Other cwl-supported tools

Docker problems

Pipeline uses dockers from MGnify DockerHub.

If you have problems pulling the docker containers, you can re-build them with:

bash docker/

Click and drag the diagram to pan, double click or use the controls to zoom.


ID Name Description Type
single_reads n/a n/a
  • File?
forward_reads n/a n/a
  • File?
reverse_reads n/a n/a
  • File?
qc_min_length n/a n/a
  • int


ID Name Description
before-qc n/a n/a


ID Name Description Type
qc-statistics n/a n/a
  • Directory
qc_summary n/a n/a
  • File
qc-status n/a n/a
  • File
hashsum_paired n/a n/a
  • File[]?
hashsum_single n/a n/a
  • File?
fastp_filtering_json_report n/a n/a
  • File?

Version History

eosc-life-gos @ bc5d676 (latest) Created 7th Feb 2022 at 10:25 by Stian Soiland-Reyes

Delete raw-read.yml

we don't need this anymore. everything is moved to gos_wf_v1.yaml

Frozen eosc-life-gos bc5d676

eosc-life-gos @ bc5d676 (earliest) Created 7th Feb 2022 at 10:16 by Stian Soiland-Reyes

Delete raw-read.yml

we don't need this anymore. everything is moved to gos_wf_v1.yaml

Frozen eosc-life-gos bc5d676
help Creators and Submitter
Not specified

Views: 1928   Downloads: 314

Created: 7th Feb 2022 at 10:16

Annotated Properties
Topic annotations
Operation annotations
help Tags

This item has not yet been tagged.

help Attributions


Total size: 495 MB
Powered by
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH