Skip to main content
ARS Home » Plains Area » Clay Center, Nebraska » U.S. Meat Animal Research Center » Genetics and Animal Breeding » Research » Publications at this Location » Publication #419678

Research Project: Genomes to Phenomes in Beef Cattle Research

Location: Genetics and Animal Breeding

Title: A vision of how low-pass sequence data should contribute to genetic evaluation in the future

Author
item Thallman, Richard
item GONDRO, CEDRIC - Michigan State University
item Engle, Bailey
item Snelling, Warren
item Borgert, Jacqueline
item Keele, John
item Kuehn, Larry

Submitted to: Journal of Animal Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 6/5/2025
Publication Date: N/A
Citation: N/A

Interpretive Summary: Low-pass sequencing refers to sequencing the DNA of animals at low cost and using bioinformatics software to impute that sequence to full genomic sequence. It has been proposed as an alternative to the current standard genotyping technology. At least one commercial product based on low-pass sequencing is available for cattle. Concerns limiting commercial adoption of the technology are: 1) the cost of storing the enormous amount of data it generates and 2) whether that additional data will result in improved accuracy of genetic evaluation. The objective is to present a vision for how low-pass sequencing technology could be implemented in the future. A format in which to store the results of low-pass sequencing is proposed. It should require orders of magnitude less storage space than the approach currently in use. A new model based on knowledge of the biology underlying the transformation of genomic variation into important traits for livestock production is proposed. It is argued that it would make better use of the information in genomic sequence than current genetic evaluation models. Changes and further advancements in the storage and modeling of genomic data and effects will provide opportunities to increase prediction accuracy of breeding values.

Technical Abstract: Low-pass sequencing refers to sequencing the DNA of individuals to a low depth of coverage (e.g., 0.5X) and imputing that sequence to genomic sequence based on reference haplotypes derived from a smaller set of individuals that were sequenced to high depth of coverage (e.g., = 10X). It has been proposed as an alternative to genotyping by SNP chips. At least one commercial product based on low-pass sequencing is available for bovines. Concerns about the current form of low-pass sequencing that limit adoption of the technology are: 1) the cost of storing the enormous amount of data it generates and 2) whether that additional data will result in improved accuracy of genetic evaluation. The objective is to present a vision for how low-pass sequencing technology could be implemented in the future to address the storage cost concern and how genetic evaluations could be modified in the future to take advantage of the additional information, resulting in increased accuracy. It is proposed that the storage issue could be addressed by representing genomic sequence of an individual in a pair of haplotype arrays with each element pointing to an enumerated haplotype of the sequence within one of approximately 60,000 defined genome segments. Assuming 100 million genomic variants, the infrastructure required to translate the identifier of any enumerated haplotype into its genomic sequence would require less than 5 GB of storage. Each haplotype array element would require 2 bytes, so the marginal storage required to represent the genomic sequence of an individual would be 240,000 bytes, or about the same as to store the genotypes from a SNP chip with 240,000 markers. This assumes no ambiguity of the imputation. Of course, the latter is unrealistic; approaches to minimize ambiguity and deal with it when necessary are discussed. Current genetic evaluations are based on models that are strictly linear and that ignore whether sets of SNP are in the same gene or not. A hierarchical, non-linear, biologically motivated model is proposed as an alternative. It begins by treating haplotypes that affect expression as distinct from haplotypes of the same gene that affect gene product. It transforms them through an alternation between multiplicative and additive steps resulting in a phenotype. It is argued that this model should extract more information from genomic sequence than would be possible with a linear model that ignores most of what is known about genetics.