Location: Virus and Prion ResearchTitle: The United States Swine Pathogen Database: integrating veterinary diagnostic laboratory sequence data to monitor emerging pathogens of swine
|INDERSKI, BLAKE - Orise Fellow|
|DIEL, DIEGO - South Dakota State University|
|PORTER, ELIZABETH - Kansas State University|
|CLEMENT, TRAVIS - South Dakota State University|
|NELSON, ERIC - South Dakota State University|
|BAI, JIANFA - Kansas State University|
|CHRISTOPHER-HENNINGS, JANE - South Dakota State University|
Submitted to: North American Porcine Reproductive and Respiratory Syndrome (NA-PRRS) Symposium
Publication Type: Abstract Only
Publication Acceptance Date: 9/16/2018
Publication Date: 12/3/2018
Citation: Anderson, T.K., Inderski, B., Diel, D.G., Porter, E., Clement, T., Nelson, E.A., Bai, J., Christopher-Hennings, J., Lager, K.M., Faaberg, K.S. 2018. The United States Swine Pathogen Database: integrating veterinary diagnostic laboratory sequence data to monitor emerging pathogens of swine [abstract]. North American Porcine Reproductive and Respiratory Syndrome (NA-PRRS) Symposium. Abstract No. 199.
Technical Abstract: Objective: Veterinary diagnostic laboratories annually derive partial nucleotide sequences of thousands of isolates of porcine reproductive and respiratory syndrome virus (PRRSV), Senecavirus A, and swine enteric coronaviruses. In addition, the advent of next generation sequencing has resulted in the rapid production of full-length genomes. Presently, the sequence data are only released to the diagnostic client, as data are associated with sensitive information. However, this information is critical and can provide information to: objectively design field-relevant vaccines; determine when and how rapidly evolving pathogens are spreading across the landscape; and identify virus transmission hotspots. Methods: In tandem with the USDA-ARS Big Data initiative, we have developed a centralized sequence database at the National Animal Disease Center. We implemented the Tripal toolkit, using Drupal and the Chado database schema. Hosting is via Amazon Web Services (AWS) for Federal Government with resource scaling, dedicated support for the prevention of data theft, and control of database vulnerabilities. Results: Each sequence housed in the database contains at a minimum four core data items: genomic information; date of collection; collection location (state level); and a unique identifier. Additionally, because the bulk of the database are PRRSV sequences, custom curation and annotation pipelines have determined PRRSV genotype (Type 1 or 2), the location of open reading frames and nonstructural proteins, generated amino acid sequences, and identified putative frame shifts. Other swine pathogens will be annotated with similar tools to facilitate data mining and hypothesis generation. Following the creation of a user account, access to all data in the repository is possible. Conclusion: The resource will provide researchers timely access to sequences discovered by highly qualified veterinary diagnosticians, allowing for biological data mining and epidemiological studies. The result will be a better understanding concerning the emergence of novel viruses in the United States, how these novel isolates are disseminated in the US and abroad, and discovering new patterns of biological consequence.