Location: Cool and Cold Water Aquaculture ResearchTitle: Use of DAVID algorithms for gene functional classification in a non-model organism, rainbow trout
Submitted to: BMC Research Notes
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/9/2018
Publication Date: 1/23/2018
Citation: Ma, H., Gao, G., Weber, G.M. 2018. Use of DAVID algorithms for gene functional classification in a non-model organism, rainbow trout. BMC Research Notes. 11:63. https://doi.org/10.1186/s13104-018-3154-7.
Interpretive Summary: Functional gene clustering is important in gene expression data analysis; however using the available software programs for non-model species such as rainbow trout is often problematic. Most programs were designed for model species and can't be easily adopted for use with gene resources derived from non-model species. We developed an R-script using functional gene classification algorithms employed in a widely accepted web based program, the Database for Annotation, Visualization and Integrated Discovery (DAVID). With our standalone R-script, the gene clustering analysis was easily conducted. Moreover, we evaluated the impact of kappa scores, a parameter to measure gene-gene relationship, on the generation of biologically meaningful gene clusters using our rainbow trout RNA sequencing data sets. Our results shed light on the selection of kappa score to conduct gene clustering and identify gene interaction networks for any species.
Technical Abstract: Gene functional clustering is essential in transcriptome data analysis but software programs are not always suitable for use with non-model species. The DAVID Gene Functional Classification Tool has been widely used for soft clustering in model species, but requires adaptations for use in non-model species that necessitate the use of custom gene lists. Moreover, there is little available information to guide selection of a kappa score to achieve the most biologically meaningful clustering. We therefore developed an R-script that allowed the analysis of custom annotated gene lists with the algorithms of the DAVID Gene Functional Classification Tool and evaluated the impact of kappa score on gene functional clustering in three treatment comparisons with a wide range of differentially expressed genes (DEGs).Using the R-script we developed we classified DEGs from an RNA-seq study of rainbow trout, with three comparisons yielding 555 to 3,340 annotated genes. The effects of kappa score on the total number of modules varied among comparisons. The percentage of DEGs harbored within a module and the number of genes shared among multiple modules decreased with increasing kappa score regardless of the number of DEGs in the comparison. The number of genes in enriched modules peaked at a kappa score of 0.5 for the comparisons with 3,340 and 1,313 DEGs and 0.3 for the comparison with 555 DEGs. The number of genes harbored within enriched modules generally decreased with increasing kappa score, however, this was affected by whether the largest modules were significantly enriched. Large non-enriched modules can be reanalyzed using a higher kappa score resulting in some of the genes clustering in smaller modules with significant enrichment scores. Our R-script allows the use of custom gene identifiers with the DAVID fuzzy heuristic partition procedure to analyze data from non-model species. The number of DEGs in a comparison and the size of individual modules can impact kappa score selection. To optimize the analysis it might be beneficial to first analyze the data with a lower kappa score but then reanalyze the genes in the largest module using a higher kappa score.