3 - MRCA
Version 1

Dating the most recent common ancestor (MRCA) of SARS-CoV-2

Live Resources

usegalaxy.org

usegalaxy.eu

usegalaxy.org.au

usegalaxy.be

Galaxy workflow

Galaxy workflow

Galaxy workflow

Galaxy workflow

Galaxy history

Galaxy history

Galaxy history

Galaxy history

Jupyter Notebook

Jupyter Notebook

Jupyter Notebook

Jupyter Notebook

What's the point?

To estimate the time of COVID-19 emergence we use simple root-to-tip regression (Korber et al. 2000; more complex and powerful phylodynamics methods could certainly be used, but for this data with very low levels of sequence divergence, simpler and faster methods suffice). From the set of all COVID-19 sequences available as of Feb 16, 2020 we obtain an MRCA date of Oct 24, 2019, which is close to other existing estimates Rambaut 2020.

Outline

This analysis consists of two components - a Galaxy workflow and a Jupyter notebook. To use a Jupyter Notebook in a Galaxy workflow see these short instructions.

The workflow is used to extract full length sequences of SARS-CoV-2, tidy up their names in FASTA files, produce a multiple sequences alignment and compute a maximum likelihood tree.

The Jupyter notebook is used to correlate branch lengths with collection dates in order to estimate MRCA timing.

Inputs

One input is required: a comma-separated file containing accession numbers and collection dates:

Accession,Collection_Date

MT019531,2019-12-30

MT019529,2019-12-23

MT007544,2020-01-25

MN975262,2020-01-11

...

An up-to-date version of this file can be generated directly from the NCBI Virus resource by

  1. searching for SARS-CoV-2 (NCBI taxid: 2697049) sequences
  2. configuring the list of results to display only the Accession and Collection date columns
  3. downloading the Current table view result in CSV format

The collection dates will be taken from the corresponding GenBank record's /collection_date tag.

Outputs

The Galaxy workflow generates a maximum-likelihood phylogenetic tree. This tree and the initial workflow input of accession numbers and collection times are then used in the Jupyter notebook to calculate an estimate of the time to the most recent common ancestor of all samples.

History and workflow

A Galaxy workspace (history) containing the most current analysis can be imported from here.

The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains version information for all tools used in this analysis.

BioConda

Tools used in this analysis are also available from BioConda:

Name

Link

ncbi-acc-download

Anaconda-Server Badge

picard

Anaconda-Server Badge

mafft

Anaconda-Server Badge

fasttree

Anaconda-Server Badge

Steps

ID Name Description
0 CoV acc date
1 Remove beginning Remove beginning1
2 Convert Convert characters1
3 Cut Cut1
4 NCBI Accession Download toolshed.g2.bx.psu.edu/repos/iuc/ncbi_acc_download/ncbi_acc_download/0.2.5+galaxy0
5 NormalizeFasta toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_NormalizeFasta/2.18.2.1
6 Text transformation toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1
7 Collapse Collection toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/4.1
8 MAFFT toolshed.g2.bx.psu.edu/repos/rnateam/mafft/rbc_mafft/7.221.3
9 FASTTREE toolshed.g2.bx.psu.edu/repos/iuc/fasttree/fasttree/2.1.10+galaxy1

Outputs

ID Name Description Type
out_file1 out_file1 n/a input
out_file1 out_file1 n/a tabular
out_file1 out_file1 n/a tabular
output output n/a input
error_log error_log n/a txt
outFile outFile n/a fasta
output output n/a input
output output n/a input
outputAlignment outputAlignment n/a fasta
output output n/a nhx
help Creators and Submitter
Creator
Submitter
Activity

Views: 277   Downloads: 4

Created: 25th Mar 2020 at 10:02

Last updated: 25th Mar 2020 at 11:23

Last used: 29th Nov 2020 at 06:51

help Tags
help Attributions

None

Related items

Powered by
(v.1.11.master)
Copyright © 2008 - 2020 The University of Manchester and HITS gGmbH