Skip to main content
ARS Home » Plains Area » College Station, Texas » Southern Plains Agricultural Research Center » Crop Germplasm Research » Research » Research Project #444968

Research Project: Towards a Pecan Pan-genome: Generating Three Reference, Annotated Genomes for Pecan

Location: Crop Germplasm Research

Project Number: 3091-21000-046-007-A
Project Type: Cooperative Agreement

Start Date: Sep 10, 2023
End Date: Sep 28, 2025

Our program has a long-term goal of constructing a pecan pan-genome. A pan-genome combines information from multiple complete reference genomes to identify genomic regions and features that differ between individuals within the same species, while preserving their differences. Accurately constructing a pan-genome requires genome sequences that have been assembled, phased, and annotated with the same approaches, to minimize bias from different methods and human error. As a foundation for biotechnology and future accelerated breeding efforts, these genome sequences must be assembled and annotated to the current state-of-the-art standards of plant genomics. This will facilitate seamless information transfer across these new genomes and the seven existing and to-be-generated genomes housed on the Department of Energy's Joint Genome Institute Phytozome database that were funded by our current and past USDA funded collaborations with the Cooperator. These seven genomes approach the required diversity to accurately represent pecan, but there are still gaps. We propose to work with the Cooperator to sequence, assemble, phase, and annotate three more pecan genomes that either have a strong commercial presence or are foundational parents in pecan breeding ('Mahan', 'Major', and 'Wichita'). The final pool of ten genomes, with extremely high continuity in their construction, will provide the suitable base for constructing a pan-genome for pecan.

Three pecan cultivars, 'Mahan', 'Major', and 'Wichita' are proposed for genome sequencing, assembly, phasing, and annotation. The three trees are located in our germplasm repository, which is maintained on-site. Multiple tissue collections for genome sequencing and annotation have been made by the USDA. Young leaf tissue that was dark treated for 24 hours was collected in Spring 2023 and immediately flash-frozen in liquid nitrogen for high molecular weight DNA extraction. Tissue from three biological replicates of six plant types (dormant bud, swollen bud, immature catkin, immature pistillate flower, expanding leaf, and root) were collected in Spring 2023 and flash frozen in liquid nitrogen. RNA will be extracted from these samples this summer and the dark-treated leaf tissue and extracted RNA will be provided to the Cooperator. The Cooperator will, at their own facilities, extract the high-molecular weight DNA from the leaf tissue, prepare the sequencing libraries, and sequence the DNA on a PacBio Revio using the CCS (HiFi) mode with a 24-hour SMRT cell movie time. Omni-C sequencing libraries will also be prepared from the leaf tissue and sequenced on an Illumina NovaSeq 6000 with a target of 30Gb of sequence per Omni-C library. The Cooperator will also prepare libraries for RNA-sequencing from the 54 provided RNA extractions and have them sequenced on an Illumina NovaSeq 6000 with a target of 40 million read pairs per RNA library. After the sequencing is completed, the Cooperator will begin the genome assembly, phasing, and annotation process using established protocols to ensure continuity in methodology between our pecan genomes and the current genomes. The pecan 'Pawnee' genome produced by these protocols is one of most complete outbred diploid plant genome to date. All genomes must be assembled and annotated to standards suitable for genome hosting on the publicly available Department of Energy's Joint Genome Institute Phytozome database. Monthly remote progress meetings will be held to consult with ARS scientists on the progression of the research and any issues encountered. After the conclusion of the genome construction and annotation process for 'Mahan', 'Major', and 'Wichita', the Cooperator will train the ARS scientists to utilize online tools to access and query the genomes by region of interest, gene ID, or functional annotation category and extract putatively functional gene variants. They will also work collaboratively to perform subsequent analyses such as detecting historical introgressions from wild-relative hickories, phasing the 'Lakota' genome sequence from it's parent genomes 'Mahan' and 'Major', and jointly publishing the resulting journal article announcing the public availability of these genomes to the research community. This research will provide the foundation for constructing a pan-genome that accurately represents the native and commercial diversity of pecans. It will also provide a framework for pecan scientists to quickly and effectively explore candidate genes identified through ongoing ARS research and increase the accuracy of selection in pecan breeding.