Submitted to: Plant Physiology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/3/2009
Publication Date: 8/12/2009
Publication URL: http://www.plantphysiol.org/cgi/content/abstract/pp.109.143370vl
Citation: Kronmiller, B.A., Wise, R.P. 2009. Computational Finishing of Large Sequence Contigs Reveals Interspersed Nested Repeats and Gene Islands in the rf1-associated Region of Maize. Plant Physiology. doi: 10.1104. p. 109.143370. 151(2):483-495. Interpretive Summary: Soon, the largest and most difficult to sequence and assemble plant genome, maize, will join the ranks of organisms with fully sequenced genomes. The maize sequencing project aims to capture the entire gene set of maize including regulatory regions. However, the current strategy will not provide a fully assembled genome, but rather assembled bacterial artificial chromosome (BAC) contigs ordered and orientated to provide complete gene regions that are adjacent to potentially incomplete transposable elements clusters. Gene density across the maize genome varies to a great degree, and large contiguous sequenced regions can begin to capture the true diversity of maize chromosome architecture. In order to characterize large contiguous regions of maize sequence, we have identified and sequenced two BAC contigs from chromosome 3 of maize B73 that span regions identified with the rf1 (restorer of fertility) locus. Comparative analysis to Oryza sativa (rice) and Sorghum bicolor (sorghum) show that while many genes are retained across all three species, genes have both been lost and translocated across the genomes. This manuscript documents important methodology for analysis of emerging large genome sequences with high transposable element content, such as maize, wheat, and barley. This approach to annotating and characterizing novel gene sequences is of broad significance to plant scientists who utilize molecular and genomic methods for crop improvement.
Technical Abstract: The architecture of grass genomes varies on multiple levels. Large long terminal repeat (LTR) retrotransposon clusters occupy significant portions of the intergenic regions, and islands of protein-encoding genes are interspersed among the repeat clusters. Hence, advanced assembly techniques are required to obtain completely finished genomes, as well as to investigate gene and transposable element (TE) distributions. To characterize the organization and distribution of repeat clusters and gene islands across large grass genomes, we present 961- and 594 kb contiguous sequence contigs associated with the rf1 locus in the near-centromeric region of maize chromosome 3. We present two methods for computational finishing of highly repetitive BAC (Bacterial Artificial Chromosome) clones that have proved successful to close all sequence gaps caused by TE insertions. Sixteen repeat clusters were observed, ranging in length from 23 kb to 155 kb. These repeat clusters are almost exclusively LTR retrotransposons, of which the paleontology of insertion varies throughout the cluster. Gene islands contain from 1 to 4 predicted genes, resulting in a gene density of 1 gene per 16 kb in gene islands, and 1 gene per 111 kb over the entire sequenced region. The two sequence contigs, when compared to the rice and sorghum genomes, retain gene co-linearity of 50% and 71%, respectively; 70% and 100%, respectively for high-confidence gene models. Collinear genes on single gene islands show that while most expansion of the maize genome has occurred in the repeat clusters, gene islands are not immune and have experienced growth in both intra- and inter-gene locations.