1a. Objectives (from AD-416):
Our project will produce a bioinformatics tool to study copy number variation by combining sequence-based and array-based approaches. This framework can easily re-purposed for other species and other purposes such as functional genomics studies using RNA-seq. Our project will provide the next generation cattle CNV map (many events at base resolution) - a crucial resource for developing CNV genotyping platforms and a cattle 1000 genomes project. It will also significantly improve the cattle reference genome and its annotation by filling in novel sequence information.
1b. Approach (from AD-416):
1. Develop a general integrated framework to detect, classify and compare copy number variation by jointly using existing next generation sequencing, aCGH and SNP genotyping data; 2. Apply these pipelines to human and cattle datasets and evaluate their performances through computational comparison and experimental validation; 3. Test functional impacts of cattle CNVs by associating them with animal production and health traits.
3. Progress Report:
Its objectives are (1) Develop an integrated tool (ArraySeq) to detect, classify, and compare copy number variations (CNV) by jointly using existing next generation sequencing (NGS), array comparative genomic hybridization (aCGH), and single nucleotide polymorphism (SNP) genotyping data; (2) Apply ArraySeq to existing human and cattle datasets; (3) Evaluate ArraySeq’s performance by computational and experimental confirmation; and (4) Test functional impacts of cattle CNV on animal production and health. For objectives 1, 2, and 3, we performed the first comprehensive discovery of CNV using next-generation sequencing from six individuals. We identified over 1,265 CNV regions comprising ~55.6 Mbp of sequence – 476 of which (~38%) had not been previously reported. We validated this sequence-based CNV call set using three independent molecular techniques achieving a validation rate of 82% and a false positive rate of 8%. We are developing and improving an integrated tool/algorithm to combine both array and NGS technologies for CNV discovery. Results will produce a second-generation cattle CNV map - a crucial genomic resource for cattle, and will significantly improve annotation of the cattle genome. For objective 4, over 30,000 Bovine50K SNP datasets were collected and processed by the program PennCNV. Because cattle CNV regions span and could impact many genes that are enriched for immunity, lactation, reproduction, and rumination, we are testing functional impacts of cattle CNV by associating them with animal production and health traits. This research continues to support two objectives of its related in-house project (1265-31000-098-00D) to develop biological resources and computational tools to enhance characterization of the bovine genome sequence (Objective 1) and to characterize conserved genome elements and identify functional genetic variation (Objective 3). These directly impact Component 2 – Problem Statements A, B, and C of the National Program Action Plan.