Submitted to: Animal Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/27/2016
Publication Date: 4/1/2017
Publication URL: http://handle.nal.usda.gov/10113/5717781
Citation: Keel, B.N., Keele, J.W., Snelling, W.M. 2017. Genome-wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds. Animal Genetics. 48:141-150. https://doi.org/10.1111/age.12519.
Interpretive Summary: Genomic structural variations are an important source of genetic diversity. Copy number variations (CNVs), gains and losses of large regions of genomic sequence between individuals of a species, are known to be associated with both diseases and phenotypic traits. Deeply sequenced genomes are often used to identify CNV, but deep sequence is too expensive to collect on many individuals. The random variation in coverage that occurs with current sequencing methods may make low coverage sequence unsuitable for CNV detection. A more complete catalog of important CNV might be obtained from low coverage sequence from several individuals if approaches capable of detecting CNV in low coverage sequence can be identified. The performance of three modern CNV detection algorithms on simulated CNV data was compared to determine a suitable strategy for identifying CNV in low coverage genomic sequence. As part of an effort to identify DNA sequence variation that affects beef cattle performance, the best method identified with simulated data was applied to low coverage DNA sequence from 154 influential bulls in the U.S. Meat Animal Research Center Germplasm Evaluation (GPE) project. These bulls were purebred sires sampled from the most popular breeds in the U.S. (Angus, Hereford, Simmental, Limousin, Charolais, Gelbvieh, and Red Angus). Over 1,500 CNVs were detected, and these CNVs overlapped 2,004 protein-coding genes. A larger than expected number of genes involved in immune system processes were affected by CNV. In addition, CNV were shown to overlap several known regions of DNA that correlate with variation in cattle phenotype. Further investigation is needed to assess how much influence the coding sequence CNVs identified from this work might have on cattle performance.
Technical Abstract: Copy number variations (CNVs) are large insertions, deletions or duplications in the genome that vary between members of a species and are known to affect a wide variety of phenotypic traits. In this study, we identified CNVs in a population of bulls using low coverage next-generation sequence data. First, in order to determine a suitable strategy for CNV detection in our data, we compared the performance of three distinct CNV detection algorithms on benchmark CNV datasets and concluded that using the multiple sample read depth approach was the best method for identifying CNVs in our sequences. Using this technique, we identified a total of 1341 copy number variable regions (CNVRs) from genome sequences of 154 purebred sires used in Cycle VII of the USMARC Germplasm Evaluation Project. These bulls represented the seven most popular beef breeds in the United States: Hereford, Charolais, Angus, Red Angus, Simmental, Gelbvieh and Limousin. The CNVRs covered 6.7% of the bovine genome and spanned 2465 protein-coding genes and many known quantitative trait loci (QTL). Genes harbored in the CNVRs were further analyzed to determine their function as well as to find any breed-specific differences that may shed light on breed differences in adaptation, health and production.