Significance and context
One of the exciting aspects of the evolution of mammalian genomes is the active generation and selection of distinct regulatory proteins from a constantly changing protein pool. Comparing human DNA sequence with that of other species has proved a valuable annotation strategy for identifying biologically functional elements in the background of non-conserved DNA. Human chromosome 19 (HSA19) is one of the smallest human chromosomes, with a size of about 65 to 70 megabases (Mb), and contains about 1,100 genes. To evaluate the HSA19 annotation and to study its evolutionary significance, Dehal and colleagues sequenced the mouse sequence homologous to HSA19. The authors could identify the reference gene sets in the assembled HSA19 sequence and homologous mouse sequence and generated comprehensive sets of gene models for both species. The existence of HSA19 as a single, conserved linkage group in most primates and the significant conservation of HSA19 genes in zebrafish were also used in a comparative analysis of human and mouse HSA19 sequences. Dehal et al. consider that the difference between the human and mouse sequences could be due to complex chromosomal rearrangement and fission events. To gain a broad overview of the evolutionary forces acting on the chromosome that may have driven these rearrangements, the authors examined mouse and human sequence surrounding the borders of all homology segments of HSA19.
Extensive sequence analysis along the length of HSA19 identified significant sequence matches to non-redundant database entries, expressed sequence tags (ESTs) from mouse bacterial artificial chromosome (BAC) libraries and predicted proteins from the sequenced genomes of Drosophila, nematode and yeast. Dehal et al. could identify 12,611 sequence elements that are significantly conserved in the syntenically related regions of HSA19. By applying various sequence analysis tools, a set of 34,733 distinct HSA19 'sequence feature blocks' was identified. Of the 340 known and predicted genes on HSA19, 31%were found to be members of large, tandemly clustered gene families, and most are represented by syntenic homologous clusters in the mouse. The lineage-specific differences in the expression levels and patterns of different tandem HSA19 genes between human and mouse sequences indicated that homologs of the human genes have been split by evolutionary rearrangement onto two mouse chromosomes. Analysis also revealed that HSA19 is rich in short interspersed (SINE) repeats (27.6%) compared to the homologous mouse sequence. The differences in SINE repeat content and associated changes in interval length represent the major difference between single-copy gene regions of HSA19 and syntenic, homologous regions of mouse DNA. Comparative analysis of zinc-finger protein genes containing a Krüppel-associated box (KRAB) motif, odorant receptor genes and vomeronasal receptor genes revealed that these genes have been duplicated, lost and selected independently in the human and mouse genomes. The expression patterns of these genes in human and mouse suggest that mammalian lineages differ in the construction and fine tuning of gene expression regulatory networks. Dehal et al. also noted the high concentrations of tandemly organized L1 repeats and retrovirus-associated long terminal repeat (LTR) sequences at sites of evolutionary breaks.
Supplementary data to Science 293:104-111 are available from the Human chromosome 19 and mouse comparison.
Comparative analysis revealed that major differences distinguishing HSA19 from related mouse DNA are due to chromosomal fission that appears to have occurred specifically in rodents. Extrapolating from the observations on HSA19, one can expect to find hundreds of new and lost lineage-specific genes as human and mouse genomes are further compared. Unique genes may be duplicated through retrotransposition and other mechanisms at significant rates, and dispersed gene copies may also be selected to carry out lineage-specific functions.
The differences between human and mouse chromosomes in the syntenic regions were due mainly to insertions of SINEs (repeated 'junk' DNA elements). Further analysis of evolutionary significance of these non-coding repeat elements may increase our knowledge of lineage-specific gene expression.