The Arabidopsis thaliana genome is about 100 megabases in size while the genomes of crop plants range from 400 megabases (rice) to more than 10,000 megabases (wheat). Arabidopsis is also relatively devoid of the repetitive sequences which make up most of the genomes of other flowering plants. Despite these differences, it probably has most of the genes found in other flowering plants. Moreover, this organism is a facile system for carrying out genetics and molecular biology research. These factors make it an ideal model organism for plant genome research.
We have formed a consortium and joined a collaborative effort to sequence this genome. Our sequencing group is focusing on chromosomes IV and V in collaboration with European and Japanese efforts. Starting from existing YAC maps, we are binning BACs, fingerprinting them to develop minimal paths and then sequencing the BACs in the minimal path. We have also carried out genome wide fingerprinting of BAC clones. Sequencing is being carried out primarily using shotgun sequencing with automated fluorescent sequencers. Since starting in the fall of 1996, we have finished over 1.5 megabases. Our steady state sequencing rate is now about 150-200 kilobases per month.
As expected, the genome is extremely gene rich. Perhaps most interesting there is some indication of conserved gene order with rather distantly related plants. It seems clear that the sequence being generated by the various Arabidopsis sequencing groups will provide a rich storehouse of information on how plants grow, develop and reproduce.