Project Number: 3040-31000-100-12-I
Project Type: Interagency Reimbursable Agreement
Start Date: Feb 1, 2015
End Date: Jan 31, 2020
A major impediment to identification of genetic variants associated with health traits has been the lack of suitable single nucleotide polymorphisms (SNP) markers in the regions containing immune complex genes. The principal reason for the paucity of markers is that the areas of the genome containing these highly repetitive gene clusters are poorly assembled in the current bovine genome assembly that was produced from a Hereford cow residing at an ARS facility. The difficulties in assembling these regions stem from the fact that the sequence data used had a read length shorter than the repeat length in the gene clusters, preventing accurate assembly. Without an accurate assembly, reads from other animals cannot be reliably mapped to the genome to identify SNP. Specific Objectives: 1. Generate new contigs from the assembly of PacBio reads from BAC and fosmid clones containing the MHC, LRC and NKC immune gene families. 2. Identify sequence variants by aligning high throughput whole genome sequence data to the assembled immune gene complexes. 3. Select informative SNP markers within each reassembled immune gene complex and perform a proof of principle Genome-wide association study experiment for bovine tuberculosis susceptibility with 1200 Holstein-Friesian cattle that have been demonstrated to be differentially susceptible.
USMARC will be responsible for Objective 1 of the proposal, creating improved assemblies compared to the existing cattle reference genome in the target immune complex gene areas. BARC will identify large-insert clones that cover the target areas, and extract DNA. This DNA will be sent to USMARC where sequencing libraries will be prepared and sequenced on the Pacific Biosciences RSII platform (PacBio). This platform produces very long sequence reads, capable of overcoming the assembly issues that plague the assembly that was based on much shorter reads (current assembly based on 1 kilobase long reads, PacBio creates an average of 7-8kb with many reads above 15kb). USMARC will generate sequence for the long insert clones, and then use these reads to accurately reconstruct the genomic regions targeted.