Location: Southern Horticultural Research
Project Number: 6062-21430-004-032-S
Project Type: Non-Assistance Cooperative Agreement
Start Date: Apr 13, 2021
End Date: Sep 30, 2023
Objective:
1. Identify genes and enzymes involved in host pathogenicity and those important for growth and survival on its host by comparative gene expression in media and in planta.
2. Identify markers that allow for primers to be designed with high specificity so this species can be differentiated from the many closely related species that may be present in the local biome.
3. Achieve genome annotation through sequencing of the expressed transcripts through transcriptomics.
Approach:
The genome of P. sequoiae has already been sequenced and therefore the focus of the work will be to collect samples of the fungus growing in media and growing in planta. The ARS scientist will collect five samples from each growth condition and immediately place them in liquid nitrogen or RNAlater® Stabilization Solution (ThermoFisher Scientific, NY, USA). RNA will be extracted using the Qiagen RNeasy® (Qiagen, Germantown, MD, USA) reagents and following the manufacturer protocols. Transcripts from each sample will be sequenced via Illumina HiSeq technologies, 150bp pair end run. Data will be analyzed in unison with the previously sequenced genome. Data will be analyzed on servers housed in the cooperator's laboratory.
Functional annotation of the genomic data will identify predicted proteins using InterProScan 5.24-63.0. Programs SignalP, EUK and TMHMM will be used to predict the presence of signal peptides or transmembrane helices in the proteins; and Pfam, PANTHER, SUPERFAMILY and SMART will be used to search for matches of amino acid sequences in the corresponding databases. Previously developed pipelines will be used to identify cytochrome P450 and peroxidases (Ibarra et al. 2019). Putative secreted proteins will be identified as those with a SignalP-noTM motif and the absence of transmembrane regions. Small secreted proteins will be defined as those smaller than 300 amino acids. Secondary metabolism genes will be detected using Antismash; Putative Carbohydrate-Active Enzymes (CAZYmes), including laccases, will be identified by obtaining InterPro signatures known to be present in CAZYmes and dbCAN databases. dbCAN will be run using hmmscan, and Tannases and Proteases will be detected based on the presence of corresponding Pfam (PF07519) and InterPro (IPR011118) domains.
Quality of the RNASeq reads will be evaluated using FastQC v.0.11.7. Adapter removal and quality trimming will be performed using fastp v.0.19.3 using default settings. The “Tuxedo protocol”, with Tophat v.2.1.1 and Cufflinks v.2.2.1, will be used for the assembly of the transcriptome and differential expression analysis. The gff3 file produced by MAKER will be included in different aspects of the analyses (Tophat, Cufflinks, Cuffmerge). Minimum and maximum intron lengths will be set to 5 and 5,000 respectively. Functional annotation of the transcripts will be completed using the annotation of the corresponding genes. Transcripts not assigned to a gene by Cufflinks, will be functionally annotated by obtaining their longest open reading frame using the ‘getorf’ subcommand in Hmmer2Go 0.17.8, setting the lower limit to 50 aa, and then using the same databases to obtain protein IDs, as described above for the genome analysis.
Using this transcriptomic data, we can compare genes expressed in culture and in planta, and thus we will be able to identify putative pathogenicity factors for P. sequoiae. From this data genes identified in P. sequoiae can be compared with genes found in other important needle and leaf pathogens. Further, we can begin to develop molecular markers for additional genetic studies of P. sequoiae.