Location: Plant, Soil and Nutrition ResearchTitle: Breaking the curse of dimensionality to identify causal variants in Breeding 4
|RAMSTEIN, GUILLAUME - Cornell University|
|JENSEN, SARAH - Cornell University|
|Buckler, Edward - Ed|
Submitted to: Theoretical and Applied Genetics
Publication Type: Review Article
Publication Acceptance Date: 12/7/2018
Publication Date: 3/1/2019
Citation: Ramstein, G.P., Jensen, S.E., Buckler IV, E.S. 2019. Breaking the curse of dimensionality to identify causal variants in Breeding 4. Theoretical and Applied Genetics. 132(3):559-567. https://doi.org/10.1007/s00122-018-3267-3.
Interpretive Summary: Plant breeding has gone through three major transformations and is currently transitioning to a new phase: 1) the initial phase (Breeding 1) has consisted of 10,000 years of crop improvement and is characterized by basic selection based on observable traits (phenotypes); 2) Breeding 2 has relied on methodologies developed in the early- to mid-twentieth century and consists of using Mendelian genetics and experimental designs to guide breeding decisions and control for variation due to measurement errors and environmental variability; 3) Breeding 3 began about 30 years ago as plant breeders used genomics to improve crops. During this phase, DNA regions responsible for phenotypic variability were identified so that breeders could more accurately select for desirable traits. The new phase in plant breeding has focused on causal variants, i.e., the exact modifications in the DNA which are responsible for phenotypic changes. Breeding 4 is characterized by the biological design of plant varieties, based on transformation and gene editing techniques, which are directed toward causal variants. Therefore, statistical analyses will require to reliably estimate effects of causal variants, by avoiding the situation When the number of loci assayed surpasses the number of plant genotypes, it is known as the curse of dimensionality. Such analyses are complementary to traditional quantitative genetic studies and should avoid the curse of dimensionality by innovative analytical techniques (machine learning models like neural networks) and novel data types (DNA sequences or field images). This paper presents some of these analyses and describes possible applications for targeting causal variants in Breeding 4
Technical Abstract: In the past, plant breeding has undergone three major transformations and is currently transitioning to a new technological phase, Breeding 4. This phase is characterized by the development of methods for biological design of plant varieties, including transformation and gene editing techniques directed toward causal loci. The application of such technologies will require to reliably estimate the effect of loci in plant genomes by avoiding the situation where the number of loci assayed (p) surpasses the number of plant genotypes (n). Here, we discuss approaches to avoid this curse of dimensionality (n'«'p), which will involve analyzing intermediate phenotypes such as molecular traits and component traits related to plant morphology or physiology. Because these approaches will rely on novel data types such as DNA sequences and high-throughput phenotyping images, Breeding 4 will call for analyses that are complementary to traditional quantitative genetic studies, being based on machine learning techniques which make efficient use of sequence and image data. In this article, we will present some of these techniques and their application for prioritizing causal loci and developing improved varieties in Breeding 4.