Location: Corn Insects and Crop Genetics ResearchTitle: NGPINT: a next-generation protein-protein interaction software
|BANERJEE, SAGNIK - Iowa State University|
|VELASQUEZ-ZAPATA, VALERIA - Iowa State University|
|Elmore, James - Mitch|
Submitted to: Briefings in Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/2/2020
Publication Date: 12/23/2020
Citation: Banerjee, S., Velasquez-Zapata, V., Fuerst, G.S., Elmore, J.M., Wise, R.P. 2020. NGPINT: a next-generation protein-protein interaction software. Briefings in Bioinformatics. 22(4). https://doi.org/10.1093/bib/bbaa351.
Interpretive Summary: All organisms respond to stimuli through a network of interacting proteins and other biomolecules. Such interactions play a pivotal role in biological processes such as signal transduction, gene transcription, protein translation, disease regulation and developmental control. In order to investigate these interacting proteins, a multitude of in vitro and in vivo biochemical techniques have been used. However, these assays are binary, i.e., they are used to interrogate a single pair of proteins at a time. Such low-throughput assays demand a significant time and resource commitment. Among these protein-protein interaction methods, yeast two-hybrid (Y2H) has been widely used and with the falling costs of next-generation sequencing, has been adapted to take advantage of this new technology. In this report, we present NGPINT, a fully automated software platform to select candidate protein-protein interactions from high-throughput Y2H next-generation sequencing experiments. Unlike previously described data processing streams, NGPINT can process data from any organism with an available genome and/or a transcriptome reference. NGPINT combines diverse tools to align sequence reads to target genomes, reconstruct the sequence of genes encoding interacting proteins and compute gene enrichment under reporter selection, and delivers consistent performance recognizing over 95% of the simulated interactions with minimal false positives. As proof of concept, NGPINT was tested using published data sets and recognized all validated interactions. Impact: NGPINT can be applied to any organism with an available reference genome, thus facilitating discovery of novel protein-protein interactions, and deciphering the critical biological processes influenced by these interactions.
Technical Abstract: Mapping protein-protein interactions at a proteome scale is critical to understanding how cellular signaling networks respond to stimuli. Since eukaryotic genomes encode thousands of proteins, testing their interactions one-by-one is a challenging prospect. High-throughput yeast-two hybrid (Y2H) assays that employ next-generation sequencing to interrogate cDNA libraries represent an alternative approach that optimizes scale, cost, and effort. We present NGPINT, a robust and scalable software to identify all putative interactors of a protein using Y2H in batch culture. NGPINT combines diverse tools to align sequence reads to target genomes, reconstruct prey fragments and compute gene enrichment under reporter selection. Central to this pipeline is the identification of fusion reads containing sequences derived from both the Y2H expression plasmid and the cDNA of interest. To reduce false positives, these fusion reads are quantified as to whether the cDNA fragment forms an in-frame translational fusion with the Y2H transcription factor. NGPINT successfully recognized 95% of interactions in simulated test runs. As proof of concept, NGPINT was tested using published data sets and it recognized all validated interactions. NGPINT can be used in any organism with an available reference, thus facilitating the discovery of protein-protein interactions in non-model organisms.