Virus in the sea of mRNA - in silico detection of novel virus in chimpanzee transcriptome data using a high-throughput pipeline
Abstract
Recent research has shed light on the microbiome to understand the enormous importance of bacteria in human body; yet in comparison, relatively less is known about the virome. During active infection, a portion of total RNA is... [ view full abstract ]
Recent research has shed light on the microbiome to understand the enormous importance of bacteria in human body; yet in comparison, relatively less is known about the virome. During active infection, a portion of total RNA is viral RNA. Therefore, analyzing transcript data can reveal viruses infecting the host under investigation. As a pilot to examining the much larger human dataset, we performed a comprehensive in silico study to detect known and novel chimpanzee viruses in transcriptome data for all of the sequences in NCBI’s Short Read Archive (SRA) database.
A high-throughput bioinformatics pipeline was developed for rapid analysis. This included retrieval of the data from SRA, removing duplicate reads, performing a SPAdes v.3.6.2 de novo assembly, followed by running GhostX v.1.3.5 for taxonomy classification. Viral hits with an e-value of <0.0001 by GhostX were inspected manually. We found a wide range of viruses, including: influenza A virus, polyomavirus, and hepatitis B virus, plus a new gammaherpesviral homolog related to Epstein-Barr virus. Our results suggest that a plethora of diverse viruses could be discovered by analyzing other available transcriptome data. With this proof of concept, larger datasets, such as human transcriptomes, human tumor transcriptomes and others, will be analyzed to better understand the possible role of viruses in disease pathogenesis and more generally to study host/virus relations and their evolution.
Authors
-
Christina Castro
(Centers for Disease Control and Prevention)
-
Terry Fei Fan Ng
(Centers for Disease Control and Prevention)
-
W. Allan Nix
(Centers for Disease Control and Prevention)
Topic Areas
Sequencing applications for metagenomics, transcriptomics, diagnostics, and biosurveillanc , Next generation finishing tools, technologies and pipelines , Human, non-human, and infectious disease applications
Session
PS-2 » Poster Session B (20:00 - Tuesday, 16th May, Mezannine & New Mexico Room)
Presentation Files
The presenter has not uploaded any presentation files.