Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/2/2002
Publication Date: 1/22/2003
Citation: Harhay, G.P., Keele, J.W. 2003. Positional candidate gene selection from livestock EST databases using Gene Ontology. Bioinformatics. 19:249-255. Interpretive Summary: The ARS and other labs throughout the world have determined partial sequences for approximately 100,000 livestock (cattle and swine) genes. Genes direct the molecular machinery of the cell to build and maintain all components necessary for its survival and establishing its role within the organism. Knowing what these genes do and where and how they function will enhance our knowledge of the molecular basis of livestock biology. However, because the function of most livestock genes have not been experimentally determined, these gene sequences are difficult to exploit. To bridge this gap in our knowledge of livestock genes, they were associated with function by connecting them to similar human genes whose function is known. This connection process utilizes a standard method of describing function to ease the searching and exchanging of functional information between different databases. This dramatically reduces the time scientists spend searching databases trying to determine gene function.
Technical Abstract: The number of expressed sequence tags (ESTs) in GenBank has now surpassed 200,000 for cattle and 100,000 for swine. The Institute of Genome Research (TIGR) has organized these sequences into approximately 60,000 non-redundant consensus sequences for cattle and 40,000 for swine in the TIGR Gene Indices. Anonymous ESTs are of limited value unless they are connected to function. Functional information is difficult to manage electronically because of heterogeneity of meaning and form across most information sources. The Gene Ontology (GO) Consortium has produced ontologies for gene function with consistent meaning and form across species. Linking livestock EST to gene function through sequence similarity with sequences from other annotation-rich mammals could accelerate (1) the discovery of positional candidate genes underlying a livestock quantitative trait locus (QTL) and (2) the generation of markers for comparative maps between livestock and other mammals (e.g., humans, mouse and rat). We initiated this investigation to determine if incorporation of the GO into the annotation process could accelerate livestock positional candidate gene discovery. We have associated livestock ESTs with GO nodes through sequence similarity to the NCBI Reference Sequence (RefSeq) database. Positional candidate genes are identified within minutes that otherwise required days or longer. The schema described here accommodates queries that return GO nodes from terms familiar to biologists, such as gene name, alternate/alias symbol, and OMIM phenotype.