|Massa, A - University Of Georgia|
|Wanjugi, H - University Of California|
|Deal, K - University Of California|
|Chan, A - J Craig Venter Institute|
|Luo, M - Dominican University Of California|
|Rabinomics, P - University Of Maryland|
|Dvorak, J - University Of California|
|Devos, K - University Of Georgia|
|O'brien, K - University Of Maryland|
|Maiti, R - J Craig Venter Institute|
|You, Frank - Dominican University Of California|
Submitted to: Molecular Biology and Evolution
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/15/2011
Publication Date: 4/6/2011
Citation: Massa, A.N., Wanjugi, H., Deal, K.R., Chan, A.P., Gu, Y.Q., Luo, M., Anderson, O.D., Rabinomics, P.D., Dvorak, J., Devos, K.M., O'Brien, K., Maiti, R., You, F. 2011. Gene space dynamics during the evolution of Aegilops tauschii, Brachypodium distachyon, Oryza sativa, and Sorghum bicolor genomes. Molecular Biology and Evolution. Available at: http://mbe.oxfordjournalsorg/content/early/2011/04/06/molbev.msr080.long. DOI: 10.1093/molbev/msr080.
Interpretive Summary: Bread wheat is the single most important source of food in the human diet. However, development of genomic sequence resource for crop improvement in wheat remains to be a challenge due to its large and complex genome. To understand the gene evolution difference between compact (such as Brachypodium) and large (such as wheat) genomes, we sequenced several large genomic regions from the ancestor wheat D genome and compared them to the corresponding regions from Brachypodium, rice, and sorghum. Detailed comparative analyses revealed that gene and gene space evolution accelerated in large genomes likely due to the presence of large amount of repetitive DNA. Our result indicated that large genome like wheat might have more genes as compared to compact genome as a result of the amount and activity of repeated sequences. The work reported here helps us further understand the evolution and domestication of wheat.
Technical Abstract: Nine different regions totaling 9.7 Mb of the 4.02 Gb Aegilops tauschii genome were sequenced using the Sanger sequencing technology and compared with orthologous Brachypodium distachyon, Oryza sativa (rice) and Sorghum bicolor (sorghum) genomic sequences. The ancestral gene content in these regions was inferred and used to estimate gene deletion and gene duplication rates along each branch of the phylogenetic tree relating the four species. The total gene number in the extant Ae. tauschii genome was estimated to be 36,371. The gene deletion and gene duplication rates and total gene numbers in the four genomes were used to estimate the total gene number in each node of the phylogenetic tree. The common ancestor of the Brachypodieae and Triticeae lineages was estimated to have had 28,558 genes and the common ancestor of the Panicoideae, Ehrhartoideae and Pooideae subfamilies was estimated to have had 27,152 or 28,350 genes, depending on the ancestral gene scenario. Relative to the Brachypodieae and Triticeae common ancestor, the gene number was reduced in B. distachyon by 3,026 genes and increased in Ae. tauschii by 7,813 genes. The sum of gene deletion and gene duplication rates, which reflects the rate of gene synteny loss, was correlated with the rate of structural chromosome rearrangements, and was highest in the Ae. tauschii lineage and lowest in the rice lineage. The high rate of gene space evolution in the Ae. tauschii lineage accounts for the fact that, contrary to the expectations, the level of synteny between the phylogenetically more related Ae. tauschii and B. distachyon genomes is similar to the level of synteny between the Ae. tauschii genome and the genomes of the less related rice and sorghum. The ratio of gene duplication to gene deletion rates in these four grass species closely parallels both the total number of genes in a species and the overall genome size. Because the overall genome size is to a large extent a function of the repeated sequence content in a genome, we suggest that the amount and activity of repeated sequences are important factors determining the number of genes in a genome.