CPRO, Centre for Biometry Wageningen (CBW), P.O. Box 16, 6700 AA Wageningen, Netherlands
At present in genetic experiments several hundreds of DNA markers are produced. Constructing genetic linkage maps for many markers can be an enormous computational problem. In the case of 50 markers an exhaustive search would require examination of more than 1064 marker orders. A practical solution to the mapping problem should be based on the rapid exclusion of (highly) improbable marker orders. In this paper we will present a fast and reliable algorithm for ordering many markers in segregating populations with two possible genotypes per locus, e.g. back-cross or (doubled) haploid populations or families of recombinant inbred lines. The algorithm is a two-stage approach using simulated annealing. Either the likelihood or the number of recombination events can be used as optimality criterion. Missing data affects estimation of pairwise recombination frequencies. The EM algorithm is used to obtain maximum likelihood estimates of pairwise recombination frequencies, using all available data and the current genetic linkage map. The Gibbs sampler is used to carry out the E-step of the EM algorithm. Simulated annealing and Gibbs sampling are used iteratively to solve the general mapping problem. Additionally, the Metropolis algorithm is used to obtain a measure of precision for the map obtained. As an example, the construction of a linkage map for a real-life data set of 83 markers determined on 94 doubled-haploid plants (213 missing data) takes about 5 minutes on a 200 MHz Pentium PC.