Skip to main content
ARS Home » Pacific West Area » Maricopa, Arizona » U.S. Arid Land Agricultural Research Center » Pest Management and Biocontrol Research » Research » Research Project #442266

Research Project: Sequencing and Annotation of the Western Tarnished Plant Bug (Lygus hesperus) Genome

Location: Pest Management and Biocontrol Research

Project Number: 2020-22620-023-028-S
Project Type: Non-Assistance Cooperative Agreement

Start Date: Aug 1, 2022
End Date: Jan 31, 2026

Use Next Generation Sequencing (NGS) to produce a chromosome-level genome assembly for the western tarnished plant bug, L. hesperus. 1a. Use PacBio CLR and HiFi to sequence the genome of L. hesperus from the ALARC laboratory colony. 1b. Use Illumina HiC to sequence the genome of L. hesperus from the ALARC laboratory colony. 1c. Use Nanopore long-read technology to sequence the genome of L. hesperus from the ALARC laboratory colony. 2. Assemble L. hesperus scaffolds and genomes using established de novo bioinformatic pipelines. 3. Use previously collected Illumina HiSeq RNAseq data as well as a collected a new pooled RNAseq data to generate a L. hesperus genome annotation. 4. Sequence the genome from a population pool of L. hesperus collected directly from alfalfa fields at the Maricopa Agricultural Center in Maricopa, Arizona.

University of Arizona(Univeristy hereafter) with field-collected L. hesperus as well as individuals from an ARS lab colony. The University will perform whole-genome sequencing, including PacBio HiFi to a 30X coverage (Sub-objective 1A), PacBio CLR (100Gb, Sub-objective 1A), Illumina HiC (75 Gb, Sub-objective 1B) and Nanopore long-read sequencing (60 Gb, Obj. 1c). The University will assemble genomes (Obj. 2) and use RNAseq to obtain gene expression information to allow for both automated and manual annotations of genes (Objective 3). Lastly, a local L. hesperus field population will be sequenced, via Illumina at 100X genome coverage, and mapped onto the de novo L. hesperus assembly (Objective 4). Because of the large genome size for L. hesperus, de novo genome assemblies will be made using HiFi and Continuous Long Reads (CLR) PacBio long-read sequencing and HiC (Illumina short reads) sequencing (Sub-objectives 1A and 1B). We will also use Nanopore long-read sequencing if the PacBio CLR approach is insufficient for long-read assembly (Sub-objective 1C). For the PacBio sequencing, 50-100 dry-ice frozen adults from the ARS lab colony will be shipped to the University for library preparation. For the HiC (OmniC) library preparation, an additional 200-300 dry-ice frozen L. hesperus will be shipped to the UNIVERSITY. DNA will be extracted and sequenced on 10% of an Illumina Novaseq lane (75 Gb or 250 million paired reads). For the HiFi PacBio long-reads, libraries will be constructed using DNA extractions derived from as few L. hesperus from the ARS lab colony as needed. The University will assess the minimum number of individuals needed by performing several DNA extractions and determining quantity and quality of extraction. DNA will be extracted using a chloroform-based protocol for the long-read libraries, which will be constructed and sequenced using PacBio Sequel II on three flow cells to achieve appropriate coverage. If necessary, Nanopore long-read sequencing will be performed using three GridION cells (~20 Gb each). DNA will be extracted similarly to the method employed for the PacBio (CLR) sequencing and the Ultra-Long DNA Sequencing Kit will be used to generate the libraries. Long reads, both PacBio and Nanopore will be integrated into the previously established assembly pipeline. To provide data for annotations, RNAseq will be performed by 1) collecting and pooling multiple life stages (egg, 1st-5th instar nymphs, adult males and females) from both the ARS lab and field colonies, and 2) from individual pools corresponding to egg, 1st-5th instar nymphs, adult males, and adult females from field collected specimens. Frozen samples, in Trizol, will be shipped to the UNIVERSITY for extraction and library preparation. For each pool, tissue will be homogenized in Trizol prior to extracting RNA and libraries will be generated using a KAPA stranded mRNA-Seq Kit. Libraries will be sequenced (12 Gb or 40 million reads per library) on an Illumina HiSeq 4000 lane. Bioinformatic analyses of the genome assemblies, including quality control (QC) assessments as well as automated and manual gene annotations, will be conducted by the University.