Objective 1: Develop biological resources and computational tools to enhance characterization of breed-specific bovine and other genomes. De novo reference genome assemblies will be developed for dairy cattle breeds (Holstein and Jersey). In addition, improvements will be made to the existing, but suboptimal, reference assemblies for Bos taurus cattle and Zebu cattle (Bos indicus). These reference genome resources are essential for discovery of single nucleotide polymorphisms (SNP) and copy number variation (CNV) polymorphisms segregating in target populations. Genome characterization will be done by state-of-the-art platforms using short- and long-read sequencing of selected animals. Candidate animals will be derived from those populations targeted for genome-based genetic improvement to enable development of novel tools for proper parent and breed composition identification. To complement these studies, epigenomic and metagenomic surveys will be explored to better define DNA methylation and ruminant microbiome, which in turn will improve overall annotation of genes, genetic variation, epigenetic variation and other sequence motifs affecting phenotype expression. Objective 2: Utilize genotypic data to enhance genetic improvement in ruminant production systems. This objective has two components. The first component identifies signatures of selection and evaluates the potential to develop community-based breeding programs based on population structure and management system limitations in goats. The second component requires the optimization and application of statistical methodologies to develop cheap low-density SNP panels that can be used to guide genetic improvement of production traits while maintaining variants enriched by natural selection during adaptation of local breeds to marginal production environments. Objective 3: Characterize functional genetic variation for improved fertility, growth, and environmental sustainability of ruminants. The third objective involves detection of genetic variation affecting fertility, growth and environmental sustainability during early embryonic development or adaptation to climate or disease using whole genome or exome resequencing. The resultant sequence information will be integrated with other database resources that provide basic information about gene expression activity and motif patterns to guide selection of positional candidate genes for further study and validation of functional annotation in ruminants. Sub-objectives for objectives 1,2 and 3 are listed in post plan under related documents.
Completion of our objectives is expected, in the short term, to improve genome-wide selection in the U.S. dairy industry as well as facilitate new genome-enhanced breeding strategies to bring economic and genetic stability to various ruminant value chains in developing nations. Ultimately, longer term objectives to identify and understand how causative genetic variation affects livestock biology will require a combination of genome sequencing and comparative genomics, quantitative genetics, epigenomics and metagenomics, all of which are components of this project plan and areas of expertise in our group. Efforts to characterize genome activity and structural conservation/variation are an extension of our current research program in applied genomics. This project plan completely leverages the resources derived from the Bovine Genomes, HapMap, 1000 Bull Genomes and FAANG projects, and genotypic data derived from the Council on Dairy Cattle Breeding (CDCB) genome-enhanced genetic evaluations for North American dairy cattle.
This is the final report for the Project 8042-31000-001-000D which will end July 23, 2022. New NP101 project, entitled “Accelerating Genetic Improvement of Ruminants Through Enhanced Genome Assembly, Annotation, and Selection” is being established. During the life of the project, for Objective 1 (develop biological resources and computational tools to enhance characterization of breed-specific bovine and other genomes), ARS scientists in Beltsville, Maryland, continued as global leaders for the improvement of reference genome assemblies. Assemblies have been released for goat (ARS1), cattle (UCD-ARS1.2), and sheep (ARS-UI_Ramb_v2.0). With a fraction of previous genome assembly costs, the quality of the latest references is far superior to any previous versions and has set a new standard for all other farm animals. The goat assembly was highlighted as the Milestones of the last 20 years of genomic sequencing by the journal of Nature (https://www.nature.com/immersive/d42859-020-00099-0/index.html). The Bovine Pangenome Consortium was launched to describe the full extent of genetic variation in cattle through the creation of genome assemblies for bovine species of economic and biodiversity importance. Using the trio-binning method, the Consortium has released 15 breed-specific reference assemblies. The genome assemblies were generated from trios of Angus and Brahman, Nelore and Brown Swiss, and Original Braunvieh breeds. Additionally, Highland cattle and yak, Piedmontese cattle and gaur, Simmental cattle and bison, as well as for Holsteins and Jerseys to improve earlier efforts, and tropically adapted indicine breeds (Sahiwal, and Tharparkar). Copy-number variation (CNV) discovery and association studies were performed based on long-read and short-read sequencing data in cattle and goats. DNA methylation has important functions in animal production, health, and reproduction. High-resolution maps of cattle DNA methylation were generated for sperm and over 20 somatic tissues, showing methylation patterns across tissues and species. These high-resolution epigenomic maps for bovine tissues are a novel resource for epigenomic research and enable better understanding of genotype-phenotype relationships in economic traits when combined with gene expression information. Progress was made in the development of microbial genome resources beyond the projections of the original plan. Advances in microbial strain resolution were achieved using cutting edge technologies and newly developed software. This technique, developed by an ARS-led collaboration of industry and international scientists, is applicable to agriculturally- and clinically-relevant metagenomes and it allows for the identification of minor strains of bacterial populations that are as low as 2%. This resolution will enable the identification of potential pathogens before they can proliferate in a sample, or allow for the high-throughput study of microbial evolution in real-time with DNA sequencing. For Objective 2 (utilize genotypic data to enhance genetic improvement in ruminant production systems), ARS scientists collected and characterized a broad representation of African goats. These results included over 2,400 goats genotyped from over 20 countries, representing over 50 breeds or populations. Two new software solutions for digital phenotyping were developed. Through collaboration with the AdaptMap and VarGoats Consortiums, 4,653 goats were genotyped and 1,372 have been sequenced worldwide. These data were used to study population genetics and positive sections in goats. Additional SNPs were also selected for parentage identification and, in collaboration with the International Goat Genome Consortium, AdaptMap, and VarGoats, to augment the Illumina GoatSNP50 BeadChip for enhanced utility in more diverse goat breeds. ARS scientists, working with scientists in India, developed a high-density genotyping array (the IndiGau chip with 800,000 markers) for Zebu cattle using the Space-Chip software. The IndiGau chip has been used in marker-enhanced genetic improvement and understanding the purity and diversity among Zebu breeds. For Objective 3 (characterize functional genetic variation for improved fertility, growth, and environmental sustainability of ruminants), sequencing data were analyzed for a better understanding of functional genetic variations. Using 172 sequenced Holstein bulls and newly assembled immune gene haplotypes, 155 SNPs were discovered that distinguished alleles of cattle immune genes, and 124 of them were included in custom genotype panels. Genome-wide association studies based on these custom markers reported that two markers predicted increased susceptibility to bovine tuberculosis. Additionally, combining parental genome and epigenome information, 16 and 25 genes were detected as potential candidate markers for male fertility and gestation length, respectively. Association studies were also performed to investigate the genetic basis of seven health traits in dairy cattle and 63 novel SNP markers identified from these studies were submitted to the NCBI’s Variation portal for public dissemination. A comprehensive gene atlas was built to study the tissue specificity of genes in cattle. This high-quality cattle gene atlas links gene expression in tissues and complex traits for the first time and provides an important basis for studying genotype-phenotype relationships in livestock. The FarmGTEx (genotype-tissue expression) Consortium was launched to provide a comprehensive atlas of tissue-specific gene expression and genetic regulation in farm animals. Both cattle and pig GTEx atlases were built based on ~10,000 transcript sequencing analyses in over 100 tissues/cell types among over 40 and 70 breeds, respectively. It described the transcriptome landscape across tissues and reports thousands of variants associated with gene expression and alternative splicing for dozens of major tissues. Additionally, this work allows us to interpret most of significant GWAS loci of economically important traits via integrative genomics analysis. A portal was developed to allow researchers to query gene expression, alternative splicing, and DNA regions associated with particular traits in an easy and uniform way across tissues and to serve as a primary reference source for farm animal genomics and breeding.
1. Bovine Pangenome Consortium and construction of improved genome assemblies. Breeding better cattle using genomics requires informative reference genomes. Previously, a single Hereford cow provided the sole reference. This deprived researchers and breeders of important information about variation among individuals and breeds. Led by ARS scientists in Beltsville, Maryland; Madison, Wisconsin; and Clay Center, Nebraska, the Bovine Pangenome Consortium developed and improved genome assemblies. The team prioritized cattle breeds having important economic impact and sought to capture appreciable genetic variation. This cutting-edge group grew to over 90 members at 58 institutions in 27 countries. Using the triobinning method, employing two parents and an offspring, the Consortium published 11 breed-specific reference assemblies and is developing methods to incorporate information from those assemblies into a single, graph-based reference genome. They generated genome assemblies from trios from Brahman and Angus, Highland and yak, bison and Simmental, Original Braunvieh, Nellore and Brown Swiss, and Gaur and Piedmontese. They published these assemblies in Nature Communications, GigaScience, the Journal of Heredity. Together, these genome assemblies rival the most complete and accurate vertebrate genomes ever produced. Together, these genome assemblies rival the most complete and accurate vertebrate genomes ever produced. Scientists have already used these assemblies to identify novel trait-associated variation which can be used to increase accuracy of genetic merit prediction and selection for important production traits in target populations.
2. Farm Genotype-Tissue Expression (FarmGTEx) Consortium and the Cattle Gene Atlas. Understanding the regulation of livestock gene expression is important for studying the biological mechanisms that underlie economic traits and for improving animal selection. FarmGTEx is an international collaborative to provide a comprehensive atlas of tissue-specific gene expression and genetic regulation in farm animals. Led by ARS scientists in Beltsville, Maryland, and researchers at the University of Edinburgh in Edinburgh, Scotland, the FarmGTEx Consortium includes over 20 universities and institutes around the world. The pilot phase of FarmGTEx built both cattle and pig GTEx atlases for the research community based on almost ~10,000 publicly available RNA-sequence datasets that represent over 100 tissues and cell types among over 40 and 70 breeds, respectively. The atlases describe the landscape of transcriptome (the RNA expressed by an organism’s genome) across tissues and report thousands of variants associated with gene expression and alternative splicing (a process that enables RNA to direct synthesis of different protein variants with different cellular functions or properties) for major tissues. Additionally, this work allows us to interpret most of significant GWAS loci of economically important traits via integrative genomics analysis. Cattle GTEx was accepted for publication in Nature Genetics. A portal was developed to allow researchers to query gene expression, alternative splicing, and DNA regions associated with particular traits in an easy and uniform way across tissues and to serve as a primary reference source for farm animal genomics and breeding. Since December 2020, it has been used over 6,000 times by producers, breeders, and scientists, to improve animal production and health based on genome-enabled selection.
3. Unprecedented resolution of microbial strain differences. The microbiome is the combined genetic material of all microorganisms (bacteria, fungi, protozoa, and viruses) that live in a particular environment. Because those microorganisms exist in large communities that are difficult to assess using old DNA sequencing technologies, ARS scientists in Madison, Wisconsin, led a research project conducted by an international and interdisciplinary team of researchers from four countries (Including the Netherlands, and Israel) and two private U.S. companies (Phase Genomics and Pacific Biosciences) in developing new methods for microbiome screening. Using the latest high accuracy, long-read DNA sequencing technologies, microbial strains could be resolved down to single nucleotide polymorphisms (SNP) in the population. ARS scientists worked with Bioinformaticians at Pacific Biosciences to develop the open-source software tool, MAGPhase, which automates the process of SNP discovery and validation in the microbiome. The improved accuracy of newer sequencing technologies allows the MAGPhase algorithm to identify clusters of SNPs (or haplotypes) that are representative of divergent strains of microbes that may harbor antibiotic resistance or pathogenesis genes. As a proof of principle analysis, a single sheep gastrointestinal sample was sequenced to great depths with the latest in long-read DNA sequencing. Using MAGPhase and improved genome assembly algorithms, 428 complete microbial genomes were assembled from that single sample, which was a record for a field that celebrated the assembly of 10 genomes from one individual. This accomplishment was of such importance that it was published in the journal, Nature Biotechnology (https://doi.org/10.1038/s41587-021-01130-z), with an accompanying opinion article highlighting its importance in Nature Microbiology (https://doi.org/10.1038/s41564-021-01027-2). The technological advance has already been applied to human microbiome research to better distinguish pathogens from beneficial microbes.
Gao, Y., Liu, S., Baldwin, R.L., Connor, E.E., Cole, J.B., Ma, L., Fang, L., Li, C., Liu, G. 2022. Functional annotation of regulatory elements in cattle genome reveals the roles of extracellular interaction and dynamic change of chromatin states in rumen development during weaning. Genomics. 114:110296. https://doi.org/10.1016/j.ygeno.2022.110296.
Gao, Y., Li, J., Cai, G., Wang, Y., Yang, W., Li, Y., Zhao, X., Li, R., Gao, Y., Tuo, W., Baldwin, R.L., Li, C., Fang, L., Liu, G.E. 2022. Single-cell transcriptomic and chromatin accessibility analyses of dairy cattle peripheral blood mononuclear cells and their responses to lipopolysaccharide. BMC Genomics. 23:338. https://doi.org/10.1186/s12864-022-08562-0.
Naji, M.M., Utsunomiya, Y.T., Solkner, J., Rosen, B.D., Meszaros, G. 2021. Assessing Bos taurus introgression in the UOA Bos indicus assembly. Genetics Selection Evolution. 53:96. https://doi.org/10.1186/s12711-021-00688-1.
Zhang, T., Wang, T., Niu, Q., Xu, L., Chen, Y., Gao, X., Gao, H., Zhang, L., Liu, G., Li, J., Xu, L. 2022. Transcriptional atlas analysis from multiple tissues reveals the expression specificity patterns in beef cattle. BMC Biology. 20(1):79. https://doi.org/10.1186/s12915-022-01269-4.
Yang, L., Gao, Y., Li, M., Park, K., Liu, S., Kang, X., Liu, M., Oswalt, A., Fang, L., Telugu, B.P., Sattler, C.G., Cole, J.B., Seroussi, E., Xu, L., Li, C., Li, L., Zhang, H., Rosen, B.D., Van Tassell, C.P., Ma, L., Liu, G. 2022. Genome-wide recombination map construction from single sperm sequencing in cattle. BMC Genomics. 23(1):181. https://doi.org/10.1186/s12864-022-08415-w.
Zhang, T., Wang, T., Niu, Q., Zheng, X., Li, H., Gao, X., Chen, Y., Gao, H., Liu, G., Zhang, L., Li, J., Xu, L. 2022. Comparative transcriptomics analysis reveals differential expression regulation underlying fatty acid composition in multiple beef cuts. Foods. 11(1):117. https://doi.org/10.3390/foods11010117.
Zhang, T., Wang, T., Niu, Q., Zheng, X., Li, H., Gao, X., Chen, Y., Gao, H., Zhang, L., Liu, G., Li, J., Xu, L. 2022. Comparative transcriptomic analysis reveals region-specifc expression patterns in diferent beef cuts. Biomed Central (BMC) Genomics. 23(1):387. https://doi.org/10.1186/s12864-022-08527-3.
Guo, J., Rui, J., Mao, A., Liu, G., Zhan, S., Li, L., Zhong, T., Wang, L., Cao, J., Chen, Y., Zhang, G., Zheng, H. 2021. Genome-wide association study reveals 14 new SNPs and confirms two structural variants highly associated with the horned/polled phenotype in goats. BMC Genomics. 22:769. https://doi.org/10.1186/s12864-021-08089-w.
Yang, L., Gao, Y., Oswalt, A., Fang, L., Boschiero, C., Neupane, M., Sattler, C.G., Seroussi, E., Xu, L., Li, C., Li, L., Zhang, H., Rosen, B.D., Van Tassell, C.P., Ma, L., Liu, G. 2022. Towards the detection of copy number variation from single sperm sequencing in cattle. BMC Genomics. 23(1):215. https://doi.org/10.1186/s12864-022-08441-8.
Yang, L., Gao, Y., Boschiero, C., Li, L., Zhang, H., Ma, L., Liu, G. 2021. Insights from initial variant detection by sequencing single sperm in cattle. Dairy. 2(4):649-657. https://doi.org/10.3390/dairy2040050.
Boschiero, C., Gao, Y., Liu, M., Baldwin, R.L., Ma, L., Li, C., Liu, G. 2022. The dynamics of chromatin accessibility prompted by butyrate-induced chromatin modification in bovine cells. Ruminants. 2(2):226-243. https://doi.org/10.3390/ruminants2020015.
Gao, Y., Ma, L., Liu, G. 2022. Initial analysis of structural variation detections in cattle using long-read sequencing methods. Genes. 13(5):828. https://doi.org/10.3390/genes13050828.
Davenport, K.M., Bickhart, D.M., Worley, K.C., Murali, S.C., Salavati, M., Clark, E.L., Cockett, N., Heaton, M.P., Smith, T.P., Murdoch, B.M., Rosen, B.D. 2022. An improved ovine reference genome assembly to facilitate in depth functional annotation of the sheep genome. Gigascience. 11. Article giab096. https://doi.org/10.1093/gigascience/giab096.
Low, W.Y., Rosen, B.D., Ren, Y., Bickhart, D.M., To, T., Martin, F.J., Billis, K., Sonstegard, T.S., Sullivan, S.T., Hiendleder, S., Williams, J.L., Heaton, M.P., Smith, T.P. 2022. Gaur genome reveals expansion of sperm odorant receptors in domesticated cattle. BMC Genomics. 23. Article 344. https://doi.org/10.1186/s12864-022-08561-1.