|Liu, Ge - George|
|Van Tassell, Curtis - Curt|
Submitted to: CONFERENCE ON THE BIOLOGY OF GENOMES
Publication Type: Book / Chapter
Publication Acceptance Date: 2/19/2007
Publication Date: 5/4/2007
Citation: Liu, G., Weirauch, M., Van Tassell, C.P., Li, R.W., Sonstegard, T.S., Matukumalli, L.K., Connor, E.E., Hanson, R.W., Yang, J. 2007. Identification of Weakly Conserved Regulatory Elements in Upstream Promoter Regions of Mammals. Conference on the Biology og Genomes, pp. 218.
Technical Abstract: Several methods have been proposed that identify sequences conserved due to evolutionary constraints by cross-species genome comparison. However, aside from the most prominently conserved transcription factor binding sites (TFBS), there is a general lack of cross reference between in silico predictions and experimental results for most TFBS, particularly for the less conserved but biologically active elements that might be relevant to tissue- and temporal-specific transcriptional regulation. A systematic approach, combining position weight matrices (PWM from JASPAR) and phylogenetic footprinting algorithm (TFLOC), was implemented to identify less conserved but biologically active TFBS in mammalian promoter regions using human, mouse and rat sequence alignments. Computational predictions were gauged by comparing with previously known TFBS in the gene promoter for the cytosolic isoform of phosphoenolpyruvate carboxykinase (GTP) (PEPCK-C), a well studied enzyme involved in hepatic and renal gluconeogenesis and in glyceroneogenesis. For most of the PWM, their matching scores did not follow a Gaussian distribution required by statistical tests. An empirical test of various cutoffs was performed to establish the best threshold for each PWM. In the PEPCK-C gene promoter, this approach produced a sensitivity of 75% and a true-positive rate of about 32% using a rigorous criterion. Virtually all of the major regulatory elements overlapped with predicted sites and several new putative TFBS were identified in critical regions. These newly discovered sites were shown to function in the control of PEPCK-C gene transcription using gel shift and reporter assays. This approach was then used to predict putative TFBS within the upstream 1kb promoter regions of all available RefSeq genes. Features of this approach include adjustable thresholds, expandable user-defined TFBS matrices, and the capability of whole genome analysis. This case study indicates that a major region of the PEPCK-C gene promoter containing regulatory elements for insulin and glucocorticoid control of gene transcription is conserved in mammals. The full TFBS dataset in upstream 1kb regions of all available RefSeq genes is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.