Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: November 17, 2004
Publication Date: April 15, 2005
Citation: Crane, C.F., Crane, Y.M. 2005. A nearest-neighboring-end algorithm for genetic mapping. Bioinformatics. 21:1579-1591. Interpretive Summary: Genetic maps are the foundation for identifying and cloning genes that control quantitative traits, but currently available mapping methods are limited in their capacity to order large numbers (thousands or tens of thousands) of markers. We show that genes can be ordered accurately under favorable conditions simply by identifying the least recombined pairs of markers, and working upward through progressively more recombined pairs of markers while keeping track of which markers have already been joined. We wrote a program that has successfully ordered 37000 markers in simulations with a sufficiently large mapping population. We use this program to show the effects of population size, marker distribution, segregation distortion, missing data, and typing errors, on its ability to find the correct map order. We also show that the necessary run time is proportional to the population size times the square of the number of markers to be mapped. The program becomes very sensitive to typing errors as it joins developing linkage groups over larger recombination values, and this impedes its application to real data in its current form. Nevertheless, it shortened one of the standard maps of wheat by 16%, by improving local marker ordering. Researchers and plant breeders involved with genetic mapping in recombinational or deletion populations will benefit from this work.
Technical Abstract: Motivation: High-throughput methods are beginning to make possible the genotyping of thousands of loci in thousands of individuals, which could be useful for tightly associating phenotypes to candidate loci. Current mapping algorithms cannot handle so many data without building hierarchies of framework maps. Results: A version of Kruskal's minimum spanning tree algorithm can solve any genetic mapping problem that can be stated as marker deletion from a set of linkage groups. These include backcross, recombinant inbred, haploid, and double-cross recombinational populations, in addition to conventional deletion and radiation hybrid populations. The algorithm progressively joins linkage groups at increasing recombination fractions between terminal markers, and attempts to recognize and correct erroneous joins at peaks in recombination fraction. The algorithm is O(mn3) for m individuals and n markers, but the mean run time scales close to mn2. It is amenable to parallel processing and has recovered true map order in simulations of large backcross, recombinant inbred, and deletion populations with up to 37005 markers. Simulations were used to investigate map accuracy in response to population size, allelic dominance, segregation distortion, missing data, and random typing errors. It produced accurate maps when marker distribution was sufficiently uniform, although segregation distortion could induce translocated marker orders. The algorithm was also used to map 1003 loci in the F7 ITMI population of bread wheat, Triticum aestivum L. emend Thell., where it shortened an existing standard map by 16 percent, but it failed to associate blocks of markers properly across gaps within linkage groups. This was because it depends upon the rankings of recombination fractions at individual markers, and is susceptible to sampling error, typing error, and joint selection involving the terminal markers of nearly finished linkage groups. Therefore, the current form of the algorithm is useful mainly to improve local marker ordering in linkage groups obtained in other ways. Availability: Source code is available from http://iubio.bio.indiana.edu/soft/molbio/qtl/flipper/