Location: Arthropod-Borne Animal Diseases Research
2012 Annual Report
Mosquitoes from at least 150 locations throughout their known distributions will be collected for a population genetics analysis. The genomic DNA from at least ten individuals from each collection location will be pooled and used for the identification of single nucleotide polymorphisms (SNPs) using restriction-site-associated DNA tags (RAD-tags) and 454-sequencing. The SNPs will be used with phyogenetics and population genetics techniques to describe the historical range expansion and to quantify the number of existing mosquito populations and the migration rates between them.
1)Completion of RAD-Seq test library prep,.
2)Completion of RAD-Seq test library sequencing and.
3)Completion of RAD-Seq test library preliminary bioinformatic analysis. 1) Completion of RAD-Seq test library prep: Completion of test RAD-Seq library prep of 12 sampels (1/2 tarsalis, 1/2 vexans) all with four different restriction enzymes: PstI, EcoRI, SgrAI, SbfI. This is important because test libraries allow us to quantify how well an enzyme works on the given sample material. In an organism (or two) without a reference genome, it is not possible to accurately estimate the number of restriction sites. Normally this would be done in silico and would provide information on which enzyme would be best to use for a given experiment. However, since that is not the case for these two species of mosquito, this information needed to be identified to more accurately move forward on the large scale efforts. Quantification of test libraries- with a Qubit fluorometer and agarose gel analysis provided us essential information (RAD-Seq fragment sizes, level of contamination and confirmation of fluorometric readings) for high quality RAD-Seq output upon sequencing. 2) Completion of RAD-Seq test library sequencing: Completion of RAD-Seq sequencing of test libraries described above. Sequencing of test libraries is essential to determine the quality and quantity of a sequence for every given sample/enzyme combination. This is essential as some sample/enzyme combinations perform better than others and without a reference genome to estimate these values, test sequencing needs to be performed. These sequencing metrics derived from a test period, such as this, will provide the much needed metrics for full scale sequencing of the 96 pools of this project. Further, since this project entails RAD sequencing of pooled samples, different considerations need to be taken. Pooled samples require higher coverage sequencing to accurately determine all alleles present in the pool, as well as calculate all allele frequencies. Lastly, RAD sequencing provides the raw data from which all bioinformatic analyses begin. 3) Completion of RAD-Seq test library preliminary bioinformatic analysis: Completion of preliminary RAD-Seq bioinformatic analysis: Preliminary bioinformatic analysis allows us to determine the following criteria for every single sequencing event: a) Determine the quality of sequence obtained for every sample (high for all samples and enzymes) b) Determine the quantity of sequence obtained for every sample (variable for all samples- gDNA quality influences output) c) Determine if the library prep parameters established in the test phase of the project were accurate and provided the amount of sequence desired (PstI, EcoRI- sufficient parameters established in test prep stage; SgrAI- library prep parameters not sufficient for full scale production and will not be able to use this enzyme- not enough restriction sites). d) Determine the overall number of RAD tags per pool (this is the number of restriction sites x2) (PstI- 799,000, EcoRI- 750,000, SgrAI- 32,000, SbfI- TBD) e) Determine if the estimated amount of sequencing per sample is enough to identify alleles in samples and to calculate allele frequencies. (TBD) f) Determine the overall amount of sequencing required to complete the project (planning for future sequencing events). TBD once enzyme for full scale production is chosen. The major accomplishments in the first year of the project have been to determine the actual number of sequenced fragments for each restriction enzyme and the number of variations in a pool associated with each enzyme. This provided evidence for the actual amount of sequencing needed to effectively complete the project. Furthermore, the sequencing has povided evidence that the given pools of samples contain far more variation per pool than was previously expected. Ultimately this will lead to the completion of full scale preparation of 96 samples followed by sequencing and analysis by Floragenex based on the information obtained above.