Project Number: 2040-22430-026-04-S
Project Type: Non-Assistance Cooperative Agreement
Start Date: Sep 1, 2015
End Date: Sep 30, 2018
1. Generate genomic assemblies as a foundational resource for developing genome-wide SNP-based diagnostic resources. 2. Genome-wide SNP survey using genotype-by-sequencing approach. 3. SNP-based population reconstruction and development of diagnostic SNP panel.
Whole genome shotgun (WGS) sequencing will be performed on B. cucurbitae, and A. ludens. For each species, an inbred crossing scheme may be performed to reduce colony heterozygosity and optimize subsequent genome assembly. Crosses will be performed at USDA-ARS-PBARC (B. cucurbitae) or USDA-APHIS lab in Mission Texas, (A ludens). DNA extraction, library preparation, and sequencing will be performed using established protocols and sequence assembly will be performed at USDA-ARS-PBARC on the Manoa computing cluster using the ALLPATHS-LG assembler to generate a high quality draft assembly. In addition to WGS and assembly, linkage mapping will be performed between two lab lines to place genomic scaffolds onto chromosome-sized linkage groups (as was done for C. capitata in previous Farm Bill) using reduced-representation sequencing approaches (ddRAD-seq). Genome annotation will be performed using the NCBI Eukaryotic Genome Annotation Pipeline, and made publically available through NCBI and the i5k web portal (i5k.nal.usda.gov). Once a high quality draft genome and linkage map exist for a species, a genotype-by-sequencing approach will be used to generate genome wide markers for wild collected material for that species. Currently, fairly comprehensive sample sets exist for both medfly and mexfly, so they will be targeted in the first year of this project. Comprehensive samples do not exist yet for Bactrocera species, but are being generated by collaborators, and will be pursued in future funding years. For each sample, DNA will be extracted, and a restriction based sequencing library will be created (ddRAD-Seq or GBS). Barcoded samples will be pooled and as many as 380 samples will be sequenced for each species. Using the genome, reads will be mapped and SNPs identified across the genome. Information from the linkage map can provide the chromosomal position of each SNP and their relative distance from each other. Using the species structure determined in objective 3, population level analysis will be performed on the markers discovered from the SNP survey, using all available material. Using the whole genome assemblies and population level analysis of the SNPs discovered, we will refine the SNPs to a small subset that are informative at differentiating between groups at a population level. To refine the SNPs to a smaller set that can be used as a diagnostic panel, a pipeline will be developed that selects SNPs with highest ability to distinguish populations (Fst-based), as well as obtain a diversity of markers across the genome (based off of position on linkage map). Using this data 24-96 SNPs will be selected and developed into a Fluidigm EP-1 panel as population genetic identification tool for both fly species. Ultimately, the panel will then be tested against fruit fly samples collected in California or Texas to assess their ability to determine population origin and test establishment hypothesis. The resulting information is reported as binary characters that can be easily shared across multiple labs and merged with new genetic data in the future.