5 - S-gene AA
Version 1

Workflow Type: Galaxy

Analysis of S-protein polymorphism

Live Resources

usegalaxy.org usegalaxy.eu usegalaxy.org.au usegalaxy.be

Galaxy workflow Galaxy workflow Galaxy workflow Galaxy workflow

Galaxy history Galaxy history Galaxy history Galaxy history

What's the point?

In the previous portion of this study we found a non-synonymous polymorphism within the S-gene. In this section we are trying to interpret its possible effect.


Obtain coding sequences of S proteins from a diverse group of coronaviruses. Generate amino acid alignment to assess conservation of the polymorphic location.


Downloaded CDS sequences of coronavirus Spike proteins from NCBI Viral Resource for the following coronaviruses:

Accession Description

FJ588692.1 Bat SARS Coronavirus Rs806/2006

KR559017.1 Bat SARS-like coronavirus BatCoV/BB9904/BGR/2008

KC881007.1 Bat SARS-like coronavirus WIV1

KT357810.1 MERS coronavirus isolate Riyadh_1175/KSA/2014

KT357811.1 MERS coronavirus isolate Riyadh_1337/KSA/2014

KT357812.1 MERS coronavirus isolate Riyadh_1340/KSA/2014

KF811036.1 MERS coronavirus strain Tunisia-Qatar_2013

AB593383.1 Murine hepatitis virus

AF190406.1 Murine hepatitis virus strain TY

AY687355.1 SARS coronavirus A013

AY687356.1 SARS coronavirus A021

AY687361.1 SARS coronavirus B029

AY687365.1 SARS coronavirus C013

AY687368.1 SARS coronavirus C018

AY648300.1 SARS coronavirus HHS-2004

DQ412594.1 SARS coronavirus isolate CUHKtc10NP

DQ412596.1 SARS coronavirus isolate CUHKtc14NP

DQ412609.1 SARS coronavirus isolate CUHKtc32NP

MN996528.1 nCov-2019

MN996527.1 nCov-2019

NC_045512.2 nCov-2019

NC_002306.3 Feline infectious peritonitis virus

NC_028806.1 Swine enteric coronavirus strain Italy/213306/2009

NC_038861.1 Transmissible gastroenteritis virus

These viruses were chosen based on a publication by Duquerroy et al. (2005). The sequences were extracted manually--a painful process. We will develop a tool for parsing particular CDS sequences automatically for future analyses.


We produce two alignments, one at the nucleotide and one at the amino acid level, of Betacoronavirus spike proteins. The alignments can be visualized with the Multiple Sequence Alignment visualization in Galaxy :

Visualization of amino acid alignment in Galaxy

Alignments of Spike proteins

Nucleic Alignment of Spike proteins

A. CDS alignments

Proteic Alignment of Spike proteins

B. Protein alignment

Workflow and history

The Galaxy history containing the latest analysis can be found here. The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains all information about tool versions and parameters used in this analysis.

Analysis Workflow

The transeq tool converts the CDS sequences into protein sequences, which we then align to each other using mafft. The output is fed into tranalign along with the nucleotide sequences. tranalign produces a nucleotide alignment coherent with the protein alignment.


Tools used in this analysis are also available from BioConda:

Name Link

mafft Anaconda-Server Badge

emboss Anaconda-Server Badge


ID Name Description
1 transeq toolshed.g2.bx.psu.edu/repos/devteam/emboss_5/EMBOSS: transeq101/5.0.0
2 MAFFT toolshed.g2.bx.psu.edu/repos/rnateam/mafft/rbc_mafft/7.221.3
3 tranalign toolshed.g2.bx.psu.edu/repos/devteam/emboss_5/EMBOSS: tranalign100/5.0.0

Version History

Version 1 (earliest) Created 25th Mar 2020 at 10:05 by Finn Bacall

Added/updated 10 files

Open master beaa423
help Creators and Submitter

Views: 1542   Downloads: 142

Created: 25th Mar 2020 at 10:05

Last updated: 25th Mar 2020 at 11:23

help Tags
help Attributions


Total size: 770 KB
Powered by
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH