Following the announcement earlier this year of the completion of the human genome project, a paper in the October 23 Nature reports the work by Andy Mungall, Stephan Beck, and 169 colleagues at the Wellcome Trust Sanger Institute in analyzing and mapping the complete sequence of human chromosome 6, the largest chromosome published so far (Nature 2003, 425:805-811).
"This represents the culmination of 10 years of research from my first grant in October 1993," Stephan Beck, head of Human Sequencing at the Sanger Institute, told us in an e-mail. "To stand here, a decade later, and see the full sequence and - more important - the full catalogue of genes is as exciting as it is humbling."
"It's rewarding to see the largest human chromosome sequenced so far finally finished," said Project Leader Andy Mungall in an e-mail to us. "The complete sequence of chromosome 6 will facilitate the identification of all other chromosome 6 genes contributing to disease, including genes involved in diseases with a complex, multifactorial basis such as schizophrenia, diabetes, cancer, and heart disease."
Mungall et al. assembled bacterial clone contigs to make a tiling path of 1797 clones. Nine contigs spanning the chromosome were sequenced with 99.99% accuracy, representing 99.5% of the euchromatin. Comparison with assembled genome sequences for mouse, rat, Tetraodon, Fugu, and zebrafish suggested that 95.6% of coding exons on chromosome 6 have been annotated.
The finished sequence is more than 166 Mb long and contains 1557 genes, of which only 50% (772) genes were previously known. Two hundred twenty-three genes are thought to have arisen through local duplication, and repeat sequences were the most abundant feature, providing a detailed 'fossil record' of the chromosome. One hundred thirty genes on the chromosome are known to cause, predispose, or protect from disease.
Chromosome 6 encodes the major histocompatibility complex (MHC), a region critical for immune response. Despite this having an overall low sex averaged recombination rate across its length, three hotspots were observed at high resolution. "The description of Mb regions as containing high or low recombination may be of limited value in understanding recombination rates," said Mungall et al. in the paper. Analysis of the distribution of 183,019 SNPs mapped onto the finished sequence revealed HLA-B to be the most polymorphic gene on both chromosome 6 and in the human genome, followed by other class I and class II genes.
Transfer RNA genes were identified in clusters, the most significant of which contained 157 genes that included almost all major species, located within the MHC class I region. "A cluster of tRNA genes colocalize with regions of high transcriptional activity (hot spots)," said Mungall, "perhaps unsurprising given the enormous requirement of tRNA in the cell. There are three such hot spots on chromosome 6."
"Adding the base pairs of chromosome 6 to those of the six others that have already been finished gives a total of 500 megabases of fully analysed human sequence. This is only 17% of the 2.8 billion base pairs of the total sequence," write Jane Grimwood and Jeremy Schmutz in an accompanying News and Views article (Nature 2003, 425:775-776). "Full analyses of the remaining 17 chromosomes should be well worth the wait."
"Complete sequence, carefully annotated by trained researchers, is the standard we seek to achieve for all the human genome," said Beck. "It's what biomedical research needs and deserves."
Powledge TM: Human genome project completed Genome Biology, April 15, 2003.
Wellcome Trust Sanger Institute