Submitted to: Molecular Biosystems
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 5/23/2009
Publication Date: 7/15/2009
Citation: Feng, J., Naiman, D.Q., Cooper, B. 2009. Coding DNA repeated throughout intergenic regions of the Arabidopsis thaliana genome: Evolutionary footprints of RNA silencing. Molecular Biosystems. 5:1679-1687.
Interpretive Summary: Patterns are inherent to biology and their presence generally has been used as evidence to signify higher orders of biological importance. Repetition of the same part has been used to explain evolutionary theory, the foundation of speech and the molecular basis of heredity. Patterns are also among the sequences of genomes and these patterns have been shown to have biological importance. Using a probability-based method, patterns originally found in non-coding genomic sequences of the model plant Arabidopsis thaliana, sequences once thought of having little relation to coding sequences, were subsequently shown to exist in half of all A. thaliana genes. These sequences, defined as pyknons, have remarkable identity to small RNA sequences involved in gene silencing. Chromosomal position mapping revealed that pyknons occur where small RNAs sequences occur. These data link genes to areas of DNA once thought of as “junk”. This discovery will facilitate analysis of the impending soybean genome sequence and is likely to show that the difference in size of non-coding regions is due in part to the accumulation of pyknons in DNA. These data are most likely to influence scientists at universities, government agencies and companies who are interested in deciphering genomes of plants.
Technical Abstract: Pyknons are non-random sequence patterns significantly repeated throughout non-coding genomic DNA that also appear at least once among genes. They are interesting because they portend an unforeseen connection between coding and non-coding DNA. Pyknons have only been discovered in the human genome, so it is unknown whether pyknons are biologically relevant or simply a phenomenon of the human genome. To address this, DNA sequence patterns from the Arabidopsis thaliana genome were detected using a probability-based method. 24,654 statistically significant sequence patterns 16 to 24 nucleotides long repeating 10 or more times in non-coding DNA also appeared in 46% of A. thaliana protein coding genes. A. thaliana pyknons exhibit features similar to human pyknons, including being distinct sequence patterns, having multiple instances in genes and having remarkable identity to small RNA sequences with roles in gene silencing. Chromosomal position mapping revealed that genomic pyknon density has concordance with siRNA and transposable element positioning density. Because the A. thaliana and human genomes have approximately the same number of genes but drastically different amounts of non-coding DNA, these data reveal that pyknons represent a biologically important link between coding and non-coding DNA. Because of the association of pyknons with siRNAs and localization to silenced regions of heterochromatin, we postulate that RNA-mediated gene silencing leads to the accumulation of gene sequences in non-coding DNA regions.