PAG-I Plant Genome I Conference

Town & Country Conference Center, San Diego, CA, November, 1992.


PG-I: GENE MAPPING ALGORITHM FOR GENE DISTANCES MEASURED WITH ERRORS

GENE MAPPING ALGORITHM FOR GENE DISTANCES MEASURED WITH ERRORS.

Fan H. Kung, Department of Forestry, Southern Illinois University, Carbondale, IL 62901-4411.


Given N genes on a chromosome, there are N2 distance between the N genes, N*(N-l)/2 absolute distances between different genes. On the other hand, given there are N*(N-I)/2 absolute distances, each with some unknown measurement error, how can one arrange the N genes into the exact order of arrangement and obtain the additive distances estimated with least squared errors? The problem can be solved by first coping all initial N2 distances to a N by N matrix, then by identifying which gene is located at the center or at the edge of a reducing file. The steps of the procedure are as follows: (1) Sum all the distances from a gene to another gene. The sum is obtained by the row sum in the matrix. (2) If the current number of genes is even, identify the gene with the maximum sum of distances from step (1) as center, else, identify the gene, with the minimum sum From step (1) as end. (3) Transfer that gene to a waiting list, delete the row and column associated with that gene and downsize the matrix to N- 1 by N- 1, then follow steps (1) and (2) to search for the next center or end in the diminished N-1 gene file. (4) Repeat the process of transferring and downsizing until there are only two genes left then, according to the reverse progression of removal (i.e. first out last in), genes being taken out are inserted back, one by one, into the gene lineup as follows: (5) Transfer the newest center gene to the middle when the current number of genes is even, else, transfer the newest end gene to one of the tails, left or right (6) Verify and discard duplicated arrangements. (7) Repeat transferring genes from the bottom of the waiting list to the gene lineup until all genes are ordered in the correct sequence. (8) Assemble a new N by N directional distance matrix. The square matrix will have zeros on the diagonal, positive distances above the diagonal and negative distances below. (9) Compute the average difference between the paired cells within the two consecutive rows as the map distance between those two neighboring genes. The distance is additive and has the property of least squared errors. For illustration, given map distances in locus pair for a electrophoretic analysis of genetic linkage in Scots pines (see Biochem. Genet. 25:803-814,1987) as follows:

The gene mapping algorithm shows that the order of genes is AD-A:PG-B:LAP-B:GOT-B and distances between successive genes are 1.375, 30.575, and 16.825. The computer program will be demonstrated during the Poster Session and will be available upon request.


Return to Previous Page or Intl-PAG Homepage