2013 Annual Report
1a.Objectives (from AD-416):
Assemble and annotate the genome sequence of the tarnished plant bug and develop a publicly accessible database.
1b.Approach (from AD-416):
Genome assemblies will be conducted using different programs and algorithms, including the programs SOAPdevovo and Abyss, using a number of different parameter settings to achieve the best assembly. The analyses require computers with very large amounts of memory. Structural gene annotations will be calculated using the Maker pipeline. Data will be made available for download and in a database format, as well as in tools such as BLAST.
The size of the Lygus lineolaris genome, estimated using flow-cytometry, was determined to be approximately 900 million base pairs (MBp). Based on this data, it was estimated that over 150 billion nucleotides of sequence data is required for initial assembly of the genome. High-throughput sequencing of 16 pools containing approximately 2,300 Lygus lineolaris recombinant bacterial artificial chromosome (BAC) clones each was carried out to obtain over 1.5 billion Illumina HiSeq2000 reads. A preliminary assembly of the genomic sequence data yielded over 400,000 contiguous genomic DNA sequence fragments (contigs), indicating that additional sequence data needs to be generated to increase the coverage and average length of the genomic DNA contigs.
In order to reduce the level of nucleotide polymorphisms that adversely affect genome assemblies, a single-pair mated line of insects was generated by mating of sibs of a highly inbred laboratory colony of Lygus lineolaris for 5 successive generations. Genomic DNA was extracted from individual insects to prepare small insert genomic DNA libraries containing 300-600 Bp inserts.
Small insert genomic DNA libraries will be size fractionated to select 250 and 400 Bp insert sizes for obtaining overlapping and non overlapping, respectively, paired-end sequence reads. In addition, randomly sheared genomic DNA will be used to construct large insert (approximately 40 KBp insert size) mate-pair sequencing libraries for improving the quality of the genome sequence with reference-guided assembly techniques. Assembled genomic DNA sequence data will be annotated using automated and manual annotation pipelines. In addition, assembled DNA sequence data will be used to as a reference to identify polymorphic genetic markers for use in genetic mapping and population genetic studies.