Workflow Type: Common Workflow Language
Work-in-progress

A workflow for marine Genomic Observatories data analysis

An EOSC-Life project

Build Status

The workflows developed in the framework of this project are based on pipeline-v5 of the MGnify resource.

This branch is a child of the pipeline_5.1 branch that contains all CWL descriptions of the MGnify pipeline version 5.1.

The following comes from the initial repo and describes how to get the databases required.


pipeline-v5

This repository contains all CWL descriptions of the MGnify pipeline version 5.0.

Documentation

For a thorough read-the-docs, click here.


We kindly recommend use the MGnify resource for data processing.

If you want to run pipeline locally, we recommend you use our pre-build docker containers.

Requirements to run pipeline

  • python3 [v 3.6+]

  • docker [v 19.+] or singularity

  • cwltool [v 3.+] or toil [v 4.2+]

  • hdd for databases ~133G

Docker

All the tools are containerized.

Unfortunately, antiSMASH and InterProScan containers are very big. We provide two options:

  1. Pre-install these tools. The instructions on how to setup the environment are here.

  2. Use containers. First of all you need to uncomment hints in InterProScan-v5.cwl and antismash_v4.cwl. Pre-pull containers from https://hub.docker.com/u/microbiomeinformatics

docker pull microbiomeinformatics/pipeline-v5.interproscan:v5.36-75.0
docker pull microbiomeinformatics/pipeline-v5.antismash:v4.2.0

Installation

git clone https://github.com/EBI-Metagenomics/pipeline-v5.git 
cd pipeline-v5

Download necessary dbs

We have 3 pipelines (amplicon, assembly and wgs) in one repository. You can download dbs for single or multiple analysis types.

Script download_dbs.sh has 3 arguments: -m (amplicon), -a (assembly), -w (raw reads / WGS).

To download only amplicon databases do -m True -a False -w False.

mkdir ref-dbs && cd ref-dbs
bash ../Installation/download_dbs.sh -a True -m True -w True  # for all types
cd ..

Create yml-file

Set DIRECTORY as path to the same directory where you downloaded all databases (ref-dbs).

TYPE: assembly/wgs/amplicon

python3 Installation/create_yml.py --dir  --type  
# example: python3 Installation/create_yml.py --dir ref-dbs --type assembly

If you need to generate several YML-files, run this script several times with different TYPEs.

Run

Before running the pipeline, you need to add lines to the YML files detailing the sequence type and path to FASTA/FASTQ file(-s).

Amplicon and Raw reads analysis can be performed on single-end or paired-end FASTQ file(-s).

Assembly pipeline requires a contig FASTA file.

  • If you are running amplicon or raw-reads single analysis - you need to add to generated YML-file:
single_reads:  
  format: edam:format_1930
  class: File
  path: 
  • If you are running amplicon or raw-reads paired analysis - you need to add to generated YML-file:
forward_reads:  
  format: edam:format_1930
  class: File
  path: 
reverse_reads:  
  format: edam:format_1930
  class: File
  path: 
  • If you are running assembly analysis - you need to add to generated YML-file:
contigs:  
  format: edam:format_1929
  class: File
  path: 

cwltool

export ANALYSIS=[amplicon/assembly/raw-reads]

cwltool --enable-dev workflows/${ANALYSIS}-wf--v.5-cond.cwl ${ANALYSIS}.yml

toil

export ANALYSIS=[amplicon/assembly/raw-reads]

toil-cwl-runner \
  --preserve-entire-environment \
  --enable-dev \
  --disableChaining \
  --logFile toil.log \
  --jobStore work-directory \
  --outdir results-folder \
  ${ANALYSIS}-wf--v.5-cond.cwl ${ANALYSIS}.yml

Other cwl-supported tools

https://www.commonwl.org/#Implementations

Docker problems

Pipeline uses dockers from MGnify DockerHub.

If you have problems pulling the docker containers, you can re-build them with:

bash docker/docker_build.sh

Inputs

ID Name Description Type
single_reads n/a n/a
  • File?
forward_reads n/a n/a
  • File?
reverse_reads n/a n/a
  • File?
qc_min_length n/a n/a
  • int

Steps

ID Name Description
before-qc n/a n/a

Outputs

ID Name Description Type
qc-statistics n/a n/a
  • Directory
qc_summary n/a n/a
  • File
qc-status n/a n/a
  • File
hashsum_paired n/a n/a
  • File[]?
hashsum_single n/a n/a
  • File?
fastp_filtering_json_report n/a n/a
  • File?

Version History

eosc-life-gos @ bc5d676 (latest) Created 7th Feb 2022 at 10:25 by Stian Soiland-Reyes

Delete raw-read.yml

we don't need this anymore. everything is moved to gos_wf_v1.yaml


Frozen eosc-life-gos bc5d676

eosc-life-gos @ bc5d676 (earliest) Created 7th Feb 2022 at 10:16 by Stian Soiland-Reyes

Delete raw-read.yml

we don't need this anymore. everything is moved to gos_wf_v1.yaml


Frozen eosc-life-gos bc5d676
help Creators and Submitter
Creators
Not specified
Submitter
Activity

Views: 410

Created: 7th Feb 2022 at 10:16

Last used: 27th Sep 2022 at 10:10

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 495 MB
Powered by
(v.1.12.2)
Copyright © 2008 - 2022 The University of Manchester and HITS gGmbH