5 - S-gene AA
Version 1

Analysis of S-protein polymorphism

Live Resources

usegalaxy.org

usegalaxy.eu

usegalaxy.org.au

usegalaxy.be

Galaxy workflow

Galaxy workflow

Galaxy workflow

Galaxy workflow

Galaxy history

Galaxy history

Galaxy history

Galaxy history

What's the point?

In the previous portion of this study we found a non-synonymous polymorphism within the S-gene. In this section we are trying to interpret its possible effect.

Outline

Obtain coding sequences of S proteins from a diverse group of coronaviruses. Generate amino acid alignment to assess conservation of the polymorphic location.

Input

Downloaded CDS sequences of coronavirus Spike proteins from NCBI Viral Resource for the following coronaviruses:

Accession

Description

FJ588692.1

Bat SARS Coronavirus Rs806/2006

KR559017.1

Bat SARS-like coronavirus BatCoV/BB9904/BGR/2008

KC881007.1

Bat SARS-like coronavirus WIV1

KT357810.1

MERS coronavirus isolate Riyadh_1175/KSA/2014

KT357811.1

MERS coronavirus isolate Riyadh_1337/KSA/2014

KT357812.1

MERS coronavirus isolate Riyadh_1340/KSA/2014

KF811036.1

MERS coronavirus strain Tunisia-Qatar_2013

AB593383.1

Murine hepatitis virus

AF190406.1

Murine hepatitis virus strain TY

AY687355.1

SARS coronavirus A013

AY687356.1

SARS coronavirus A021

AY687361.1

SARS coronavirus B029

AY687365.1

SARS coronavirus C013

AY687368.1

SARS coronavirus C018

AY648300.1

SARS coronavirus HHS-2004

DQ412594.1

SARS coronavirus isolate CUHKtc10NP

DQ412596.1

SARS coronavirus isolate CUHKtc14NP

DQ412609.1

SARS coronavirus isolate CUHKtc32NP

MN996528.1

nCov-2019

MN996527.1

nCov-2019

NC_045512.2

nCov-2019

NC_002306.3

Feline infectious peritonitis virus

NC_028806.1

Swine enteric coronavirus strain Italy/213306/2009

NC_038861.1

Transmissible gastroenteritis virus

These viruses were chosen based on a publication by Duquerroy et al. (2005). The sequences were extracted manually--a painful process. We will develop a tool for parsing particular CDS sequences automatically for future analyses.

Output

We produce two alignments, one at the nucleotide and one at the amino acid level, of Betacoronavirus spike proteins. The alignments can be visualized with the Multiple Sequence Alignment visualization in Galaxy :

Visualization of amino acid alignment in Galaxy

Alignments of Spike proteins

Nucleic Alignment of Spike proteins

A. CDS alignments

Proteic Alignment of Spike proteins

B. Protein alignment

Workflow and history

The Galaxy history containing the latest analysis can be found here. The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains all information about tool versions and parameters used in this analysis.

Analysis Workflow

The transeq tool converts the CDS sequences into protein sequences, which we then align to each other using mafft. The output is fed into tranalign along with the nucleotide sequences. tranalign produces a nucleotide alignment coherent with the protein alignment.

BioConda

Tools used in this analysis are also available from BioConda:

Name

Link

mafft

Anaconda-Server Badge

emboss

Anaconda-Server Badge

Steps

ID Name Description
0 Input dataset
1 transeq toolshed.g2.bx.psu.edu/repos/devteam/emboss_5/EMBOSS: transeq101/5.0.0
2 MAFFT toolshed.g2.bx.psu.edu/repos/rnateam/mafft/rbc_mafft/7.221.3
3 tranalign toolshed.g2.bx.psu.edu/repos/devteam/emboss_5/EMBOSS: tranalign100/5.0.0

Outputs

ID Name Description Type
out_file1 out_file1 n/a fasta
outputAlignment outputAlignment n/a fasta
out_file1 out_file1 n/a fasta
help Creators and Submitter
Creator
Submitter
Activity

Views: 260   Downloads: 4

Created: 25th Mar 2020 at 10:05

Last updated: 25th Mar 2020 at 11:23

Last used: 29th Nov 2020 at 10:50

help Tags
help Attributions

None

Related items

Powered by
(v.1.11.master)
Copyright © 2008 - 2020 The University of Manchester and HITS gGmbH