The Em protein is a major late embryogenesis abundant protein which is present in most embryos. We isolated the two Arabidopsis genes coding for this protein, AtEm1 and AtEm6. The two proteins differ by the presence of a 20 aminoacid motif which is present once in Em6 and repeated 4 times in Em1. The two types of genes have been observed in cotton, maize, barley and soybean. AtEm1 gene has been located by RFLP on chromosome 3 and several YACs were identified in the CIC library. AtEm6 was located on chromosome 2. As part of the EEC ESSA (European Scientists Sequencing Arabidopsis) programme we initiated sequencing of a 75 kbp region around AtEm1. So far, approximately 30 kbp have been completed, identifying at least 8 genes. All of them are new in higher plants. All but one could be assigned a tentative function on the basis of sequence homology with a sequence from another species. Analysis of the sequence revealed a very high gene density in this region, with rather short intergenic regions. Several cognate cDNAs were isolated and sequenced, so that the exon/intron structure could be deduced. This information was used to improve the GENEFINDER programme. Several microsatellites were observed, as well as a very AT-rich region. This region is being used in comparative mapping with Brassica.