Skip to main content
ARS Home » Plains Area » Clay Center, Nebraska » U.S. Meat Animal Research Center » Genetics and Animal Breeding » Research » Publications at this Location » Publication #422849

Research Project: Genomes to Phenomes in Beef Cattle Research

Location: Genetics and Animal Breeding

Title: Short Communication: Imputation accuracy of host genomic data from metagenomic sequenced information

Author
item LAKAMP, ANDREW - University Of Nebraska
item NEUHJAR, ALISON - University Of Nebraska
item FERNANDO, SAMODHA - University Of Nebraska
item Snelling, Warren
item SPANGLER, MATTHEW - University Of Nebraska

Submitted to: Journal of Animal Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 5/12/2025
Publication Date: 5/17/2025
Citation: Lakamp, A.D., Neuhjar, A.C., Fernando, S.C., Snelling, W.M., Spangler, M.L. 2025. Short Communication: Imputation accuracy of host genomic data from metagenomic sequenced information. Journal of Animal Science. 103. Article skaf175. https://doi.org/10.1093/jas/skaf175.
DOI: https://doi.org/10.1093/jas/skaf175

Interpretive Summary: Metagenomic sequencing is a routine procedure in microbial ecology studies that performs sequence analysis of all DNA in a given sample. If the sample is obtained from a living organism (the host), then the host’s own DNA is present in the sample and will be sequenced. Most microbial ecology studies remove host DNA as it is not the focus of the study or is not of sufficient quality to be used on its own. This study examines how the process of imputation can be used to obtain usable host genotypes from metagenomic sequence information generated from ocular swabs of juvenile beef cattle. Imputed host genotypes were compared to genotypes from a commercially available array to estimate the percentage of loci that were identical between the genotyping methods. With no additional filtering, the average similarity between the 2 genotyping methods was 83%. With the inclusion of additional quality filters (e.g., lower limit on imputed genotype confidence), similarity increased to 88% to 99%, with an average of 98%. Importantly, imputing host genomic sequence from the metagenomic data can provide valuable information on the host and help with host verification if the host has been previously genotyped. The results demonstrate that imputing host genomic information from metagenomic sequencing data is possible.

Technical Abstract: Metagenomic sequencing is the process of extracting all the genomic information from a given sample. Most metagenomic studies remove any host reads as a matter of course. However, host reads can be used as the basis for genotype imputation to obtain whole genomic sequences. The accuracy of these imputed genotypic calls from a bovine ocular sample was determined by comparing results to those from a commercial array. Overall, imputed genotype calls proved to have a high concordance with array genotype calls (average concordance of 83% and correlation of 0.81 with no filtering). Accuracy increased as filters for host read depth and imputed call confidence were implemented. With filters in place, the average percent concordance was 98% (88% to 99%) while the mean correlation was 0.98 (0.89 to 0.99). Further, identity verification of the metagenomic samples can be carried out if the host is genotyped on another platform.