Skip to main content
ARS Home » Midwest Area » Ames, Iowa » National Animal Disease Center » Virus and Prion Research » Research » Publications at this Location » Publication #349547

Research Project: Intervention Strategies to Control Influenza A Virus Infection in Swine

Location: Virus and Prion Research

Title: Automating genetic classification for hemagglutinin and neuraminidase genes from influenza A viruses through machine learning methods

item ZELLER, M - Iowa State University
item ANDERSON, T - Orise Fellow
item Baker, Amy
item GAUGER, P - Iowa State University

Submitted to: International Pig Veterinary Society (IPVS)
Publication Type: Abstract Only
Publication Acceptance Date: 3/1/2018
Publication Date: 6/11/2018
Citation: Zeller, M., Anderson, T., Vincent, A.L., Gauger, P. 2018. Automating genetic classification for hemagglutinin and neuraminidase genes from influenza A viruses through machine learning methods [abstract]. Pig Veterinary Society International Congress Proceedings. p. none assigned.

Interpretive Summary:

Technical Abstract: Introduction To combat the spread of influenza A virus (IAV) in swine, surveillance of circulating strains and rapid genetic classification is required to facilitate the design of vaccines for virus control in swine herds. Identifying the genetic clade is the baseline step to match circulating field strains with vaccine strains. Consequently, a method that quickly and accurately classifies IAV into genetic clades will allow vaccine selection by veterinarians and may inform strain updates in commercially available multivalent vaccines that protect swine against genetically similar IAV. The objective is to develop an automated method for assigning IAV phylogenetic clade classifications using machine learning methods. Materials and Methods Machine learning methods were applied to classify unknown hemagglutinin (HA) and neuraminidase (NA) sequence data from IAV detected in United States (U.S.) swine to known genetic clades circulating in the U.S. Training (70%) and cross validation (30%) sets of H1, H3, N1 and N2 were assigned genetic clade labels using maximum-likelihood phylogenetic methods. Genetic clade labels for each gene segment were assigned based on nearest neighbor identity. A multiclass one-vs-all logistic regression classifier with regularization (C=1.0) was developed with scikit-learn. Using genetic clades as labels, the classifier was fitted using the aligned nucleotides as binary features. The fitted classifier was used to classify a test dataset. Probabilities < 0.85 were assigned an 'other' classification. Results The model was evaluated through precision (>0.95) and recall (>0.95). The machine learning classifications were validated against phylogenetically-informed classification with no disagreement in sequences not given an 'other' designation. Conclusions This automated classifier implemented a machine-learning algorithm and provided rapid and accurate genetic classification of unknown HA and NA sequence data for North American swine IAV. The required input is HA or NA nucleotide sequence generated at diagnostic laboratories. This classifier requires little computational power and can be further developed into a web interface or to include global classifiers for IAV outside North America. Fast, user-friendly tools such as this increases access to computational methods to a broad user base by reducing time and complexity, allowing producers and veterinarians to make informed vaccine choices for IAV.