Preprocessing of raw SARS-CoV-2 reads
The raw reads available so far are generated from bronchoalveolar lavage fluid (BALF) and are metagenomic in nature: they contain human reads, reads from potential bacterial co-infections as well as true COVID-19 reads.
Live Resources
| usegalaxy.org | usegalaxy.eu | usegalaxy.org.au | usegalaxy.be |
|:--------:|:------------:|:------------:|:------------:|
What's the point?
Assess quality of reads, remove adapters and remove reads mapping to human genome.
The outline
Illumina and Oxford nanopore reads are pulled from the NCBI SRA (links to SRA accessions are available here). They are then processed separately as described in the workflow section.
Inputs
> :boom: If you experience problems downloading data from NCBI SRA, use Galaxy history pre-populated with inputs as described in "Alternate Workflow" section below.
Only SRA accessions are required for this analysis. The described analysis was performed with all SRA SARS-CoV accessions available as of Feb 20, 2020:
- Illumina reads
```
SRR10903401
SRR10903402
SRR10971381
```
- Oxford Nanopore reads
```
SRR10948550
SRR10948474
SRR10902284
```
Outputs
This workflow produces three outputs that are used in two subsequent analyses:
| # | Output | Used in |
|----|------|---------|
| 1. | A combined set of adapter-free Illumina reads without human contamination | Assembly |
| 2. | A combined set of Oxford Nanopore reads without human contamination | Assembly |
| 3. | A collection of adapter-free Illumina reads from which human reads have not been removed | Variation detection |
The history and the workflow
A Galaxy workspace (history) containing the most current analysis can be imported from here.
The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains version information for all tools used in this analysis.
The workflow performs the following steps:
Illumina
Illumina reads are QC'ed and adapter sequences are removed using
fastp
Quality metrics are computed and visualized using
fastqc
andmultiqc
Reads are mapped against human genome version
hg38
usingbwa mem
Reads that do not map to
hg38
are filtered out usingsamtools view
Reads are converted back to fastq format using
samtools fastx
Oxford nanopore
Reads are QC'ed using
nanoplot
Quality metrics are computed and visualized using
fastqc
andmultiqc
Reads are mapped against human genome version
hg38
usingminimap2
Reads that do not map to
hg38
are filtered out usingsamtools view
Reads are converted back to fastq format using
samtools fastx
BioConda
Tools used in this analysis are also available from BioConda:
| Name | Link |
|------|----------------|
Alternate Workflow
An alternate starting point has been created for those not wanting to wait for the data to be downloaded from the NCBI SRA. (This can especially be an issue in Australia or Europe.)
There is a shared history containing all of the starting data in appropriate collections and an alternate workflow able to make use of this alternate input. Apart from a slightly different starting point, the workflow and the outputs it produces are identical to that above.
| usegalaxy.org | usegalaxy.eu | usegalaxy.org.au | usegalaxy.be |
|:-----------:|:------------:|:----------------:|:----------------:|
Inputs
ID | Name | Description | Type |
---|---|---|---|
bed_file | bed_file | runtime parameter for tool Filter SAM or BAM, output SAM or BAM | n/a |
Steps
ID | Name | Description |
---|---|---|
0 | List of Illumina accessions | |
1 | List of ONT accessions | |
2 | Illumina data | toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/2.10.4 |
3 | ONT data | toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/2.10.4 |
4 | fastp: Trimmed Illumina Reads | toolshed.g2.bx.psu.edu/repos/iuc/fastp/fastp/0.19.3.3 |
5 | NanoPlot | toolshed.g2.bx.psu.edu/repos/iuc/nanoplot/nanoplot/1.28.2+galaxy1 |
6 | FastQC | toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.72 |
7 | Map with minimap2 | toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.12 |
8 | MultiQC | toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.7 |
9 | Map with BWA-MEM | toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.1 |
10 | MultiQC | toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.7 |
11 | Filter SAM or BAM, output SAM or BAM | toolshed.g2.bx.psu.edu/repos/devteam/samtool_filter2/samtool_filter2/1.8 |
12 | Filter SAM or BAM, output SAM or BAM | toolshed.g2.bx.psu.edu/repos/devteam/samtool_filter2/samtool_filter2/1.8 |
13 | MergeSamFiles | toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_MergeSamFiles/2.18.2.1 |
14 | MergeSamFiles | toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_MergeSamFiles/2.18.2.1 |
15 | ONT filtered reads | toolshed.g2.bx.psu.edu/repos/iuc/samtools_fastx/samtools_fastx/1.9+galaxy1 |
16 | Illumina filtered reads | toolshed.g2.bx.psu.edu/repos/iuc/samtools_fastx/samtools_fastx/1.9+galaxy1 |
Outputs
ID | Name | Description | Type |
---|---|---|---|
list_paired | list_paired | n/a | input |
output_collection | output_collection | n/a | input |
output_collection_other | output_collection_other | n/a | input |
log | log | n/a | txt |
list_paired | list_paired | n/a | input |
output_collection | output_collection | n/a | input |
output_collection_other | output_collection_other | n/a | input |
log | log | n/a | txt |
output_paired_coll | output_paired_coll | n/a | input |
report_html | report_html | n/a | html |
report_json | report_json | n/a | json |
output_html | output_html | n/a | html |
nanostats | nanostats | n/a | txt |
nanostats_post_filtering | nanostats_post_filtering | n/a | txt |
read_length | read_length | n/a | png |
log_read_length | log_read_length | n/a | png |
html_file | html_file | n/a | html |
text_file | text_file | n/a | txt |
alignment_output | alignment_output | n/a | bam |
stats | stats | n/a | input |
html_report | html_report | n/a | html |
bam_output | bam_output | n/a | bam |
stats | stats | n/a | input |
html_report | html_report | n/a | html |
output1 | output1 | n/a | sam |
output1 | output1 | n/a | sam |
outFile | outFile | n/a | bam |
outFile | outFile | n/a | bam |
nonspecific | nonspecific | n/a | fasta |
forward | forward | n/a | fasta |
reverse | reverse | n/a | fasta |

Creators
Not specifiedSubmitter
Views: 152 Downloads: 11
Created: 17th Sep 2020 at 10:38
Last used: 17th Jan 2021 at 20:32


None
Version History
Version 1 Created 17th Sep 2020 at 10:38 by Finn Bacall
No revision comments