A workflow for marine Genomic Observatories data analysis
An EOSC-Life project
The workflows developed in the framework of this project are based on pipeline-v5
of the MGnify resource.
This branch is a child of the
pipeline_5.1
branch that contains all CWL descriptions of the MGnify pipeline version 5.1.
The following comes from the initial repo and describes how to get the databases required.
pipeline-v5
This repository contains all CWL descriptions of the MGnify pipeline version 5.0.
Documentation
For a thorough read-the-docs, click here.
We kindly recommend use the MGnify resource for data processing.
If you want to run pipeline locally, we recommend you use our pre-build docker containers.
Requirements to run pipeline
-
python3 [v 3.6+]
-
docker [v 19.+] or singularity
-
cwltool [v 3.+] or toil [v 4.2+]
-
hdd for databases ~133G
Docker
All the tools are containerized.
Unfortunately, antiSMASH and InterProScan containers are very big. We provide two options:
-
Pre-install these tools. The instructions on how to setup the environment are here.
-
Use containers. First of all you need to uncomment hints in InterProScan-v5.cwl and antismash_v4.cwl. Pre-pull containers from https://hub.docker.com/u/microbiomeinformatics
docker pull microbiomeinformatics/pipeline-v5.interproscan:v5.36-75.0
docker pull microbiomeinformatics/pipeline-v5.antismash:v4.2.0
Installation
git clone https://github.com/EBI-Metagenomics/pipeline-v5.git
cd pipeline-v5
Download necessary dbs
We have 3 pipelines (amplicon, assembly and wgs) in one repository. You can download dbs for single or multiple analysis types.
Script download_dbs.sh has 3 arguments: -m (amplicon), -a (assembly), -w (raw reads / WGS).
To download only amplicon databases do -m True -a False -w False
.
mkdir ref-dbs && cd ref-dbs
bash ../Installation/download_dbs.sh -a True -m True -w True # for all types
cd ..
Create yml-file
Set DIRECTORY as path to the same directory where you downloaded all databases (ref-dbs).
TYPE: assembly/wgs/amplicon
python3 Installation/create_yml.py --dir --type
# example: python3 Installation/create_yml.py --dir ref-dbs --type assembly
If you need to generate several YML-files, run this script several times with different TYPEs.
Run
Before running the pipeline, you need to add lines to the YML files detailing the sequence type and path to FASTA/FASTQ file(-s).
Amplicon and Raw reads analysis can be performed on single-end or paired-end FASTQ file(-s).
Assembly pipeline requires a contig FASTA file.
- If you are running amplicon or raw-reads single analysis - you need to add to generated YML-file:
single_reads:
format: edam:format_1930
class: File
path:
- If you are running amplicon or raw-reads paired analysis - you need to add to generated YML-file:
forward_reads:
format: edam:format_1930
class: File
path:
reverse_reads:
format: edam:format_1930
class: File
path:
- If you are running assembly analysis - you need to add to generated YML-file:
contigs:
format: edam:format_1929
class: File
path:
cwltool
export ANALYSIS=[amplicon/assembly/raw-reads]
cwltool --enable-dev workflows/${ANALYSIS}-wf--v.5-cond.cwl ${ANALYSIS}.yml
toil
export ANALYSIS=[amplicon/assembly/raw-reads]
toil-cwl-runner \
--preserve-entire-environment \
--enable-dev \
--disableChaining \
--logFile toil.log \
--jobStore work-directory \
--outdir results-folder \
${ANALYSIS}-wf--v.5-cond.cwl ${ANALYSIS}.yml
Other cwl-supported tools
https://www.commonwl.org/#Implementations
Docker problems
Pipeline uses dockers from MGnify DockerHub.
If you have problems pulling the docker containers, you can re-build them with:
bash docker/docker_build.sh
Click and drag the diagram to pan, double click or use the controls to zoom.
Inputs
ID | Name | Description | Type |
---|---|---|---|
single_reads | n/a | n/a |
|
forward_reads | n/a | n/a |
|
reverse_reads | n/a | n/a |
|
qc_min_length | n/a | n/a |
|
Steps
ID | Name | Description |
---|---|---|
before-qc | n/a | n/a |
Outputs
ID | Name | Description | Type |
---|---|---|---|
qc-statistics | n/a | n/a |
|
qc_summary | n/a | n/a |
|
qc-status | n/a | n/a |
|
hashsum_paired | n/a | n/a |
|
hashsum_single | n/a | n/a |
|
fastp_filtering_json_report | n/a | n/a |
|
Version History
eosc-life-gos @ bc5d676 (latest) Created 7th Feb 2022 at 10:25 by Stian Soiland-Reyes
Delete raw-read.yml
we don't need this anymore. everything is moved to gos_wf_v1.yaml
Frozen
eosc-life-gos
bc5d676
eosc-life-gos @ bc5d676 (earliest) Created 7th Feb 2022 at 10:16 by Stian Soiland-Reyes
Delete raw-read.yml
we don't need this anymore. everything is moved to gos_wf_v1.yaml
Frozen
eosc-life-gos
bc5d676
Creators
Not specifiedSubmitter
Views: 2003 Downloads: 356
Created: 7th Feb 2022 at 10:16
This item has not yet been tagged.
None