Migun Shakya1, Pavel Senin1, Shihai Feng1, Chien-chi Lo1, Bin Hu1,2, 3, Patrick S. G Chain1.1: Bioscience Division, Los Alamos National Laboratory Los Alamos, NM 87544 2: Health Research Program, CSRA Inc., Atlanta, GA 30329 ... [ view full abstract ]
Migun Shakya1, Pavel Senin1, Shihai Feng1, Chien-chi Lo1, Bin Hu1,2, 3, Patrick S. G Chain1.
1: Bioscience Division, Los Alamos National Laboratory Los Alamos, NM 87544
2: Health Research Program, CSRA Inc., Atlanta, GA 30329
3: National Center for Emerging and Zoonotic and Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA 30329
Transcriptomics is a powerful technique that has contributed to many biological discoveries. Traditionally, transcriptomics enables finding genes and pathways that are differentially expressed in one condition over another, discovering small RNAs (sRNA), annotating transcribed genes, and characterizing alternative splicing. With the rapid advancement in sequencing technologies providing unprecedented throughput at an acceptable cost, many research laboratories have shown interests in applying transriptomics to identify genes that are differentially expressed in distinct cell populations, or in response to different treatments. However, most of these laboratories have found themselves continuously challenged by the lack of bioinformatics and statistical expertise needed to design, implement, and maintain computational workflows capable of analyzing large amounts of sequencing data.
A typical transcriptomics workflow requires implementing an array of bioinformatics tools, each of which addresses a particular step in the analysis, e.g. quality control, alignment, fragment counting, statistical hypothesis testing, etc. It is also pivotal of the analysis workflow to maintain an open and modular architecture, so that new tools can be added to the existing workflow for enabling new functionality and improving existing ones. Moreover, the workflow also needs to be optimized for high throughput and precision as well.
Here, we present PiReT, a one of a kind reference based transcriptomics workflow solution that adopts an open architecture and enables biologists with little or no computational knowledge to analyze their data. PiReT effectively weaves together open source bioinformatics tools and presents it in an interactive web Graphical User Interface (GUI). PiReT users can upload their data (fastq, BAM/SAM), customize steps of analysis flow, and produce biologist-friendly results (e.g. RPKM/FPKM/TPM, read counts, identify regulated genes and pathway, etc.) and data visualizations within the GUI. In addition to routine transcriptomics analyses such as differential gene expression, PiReT can also check for contamination, remove sequences of choice, metatranscriptome analysis like host and pathogens responses in infection studies, and detect sRNAs. Thanks to its open architecture, new analysis tools and modules can be added based on user preferences. PiReT currently is a stand-alone workflow, but it will be integrated into EDGE Bioinformatics, extending its analytical capabilities.
De novo sequencing, re-sequencing, Human seq., RNA seq., metagenomics, etc. , Sequencing applications for metagenomics, transcriptomics, diagnostics, and biosurveillanc