Rice is one of the most highly consumed cereals in the world and is the staple diet of over half the world's population. In the April 5 Science, two independent groups report the first draft sequences of the rice genome. Stephen Goff and researchers at Syngenta's Torrey Mesa Research Institute (TMRI) in California, carried out whole-genome shotgun sequencing of the Oryza sativa L. Ssp. japonica subspecies (Science 2002, 296:92-100), while scientists in China, at the Beijing Genomics Institute (BGI), chose the widely cultivated Oryza sativa L. Ssp. Indica (Science 2002, 296:79-92).
The availability of data from two subspecies will provide ample material for comparative genomics. The first observation is the differences in reported size: the BGI group report a genome that is 10% bigger (at 466 Mb) than the TMRI sequence (420 Mb); both are relatively small for a grass genome. It remains to be determined whether this difference is due to subspecies differences related to repeat sequences, or technical differences in the assembly and annotation strategies. The Syngenta draft sequence (referred to as Syd) covers 93% of the genome and has a GC content of 44%. The BGI sequence has similar coverage (92%) and a predicted misassembly error rate of 1.1%; the Chinese group document a gradient of GC content in rice coding sequences. Repetitive DNA accounts for 42-45% of the rice genomes including MITEs (miniature inverted-repeat transposable elements) and retrotransposons.
Both groups provide estimates of the number of genes (from 32,000 to 55,000), reiterating what we have already learnt about the discrepancy between apparent organismal complexity and total gene number (humans are estimated to have around 35,000 genes and Arabidopsis around 25,000); the mean gene size is 4.5 kb. The numbers may represent an over-estimation and reflect the difficulty in predicting genes.
Both groups report extensive comparison with the other published plant genome, that of the dicot weed Arabidopsis thaliana, which is 3.7 times smaller. Most (80-85%) Arabidopsis genes have a rice homolog, and about a third of these appear to be specific to plants; about half of the predicted rice genes have no Arabidopsis homolog. Over-represented plant gene classes include RING-finger proteins and F-box-domain proteins that play roles in protein turnover and degradation.
Analysis of the synteny between the rice and Arabidopsis genomes provides insights into the evolution of monocots and dicots. Certain classes of disease resistance R genes appear to have evolved since the divergence of the two types of plant. Rice and Arabidopsis have homologous genes implicated in flowering and development, although the conservation of these genetic networks merits further investigation. A quarter of rice genes may be involved in metabolism, with much apparent gene redundancy.
The two draft rice genome sequences will serve as an important platform for cereal genomics. Completion of the sequence and improvements in annotation will lead to crop improvements that should help to feed the world.
Torrey Mesa Research Institute
Genome Database of Chinese super hybrid rice
Plant sequence completed