Examples of evidence for mechanisms that have caused domain gains. (a) An example of a domain gain mediated by retroposition. TreeFam family TF352220 contains genes with a transposase domain (PF01359). The primate transcripts in this family have been extended at their amino terminus with the pre-SET and SET domains. The representative transcript for this gain event is SETMAR-201 (ENST00000307483; left-hand side). Both gained domains have a significant hit in the gene SUV39H1 (ENSG00000101945; right-hand side) - the Set domains of the donor and recipient proteins share 41% identity. Previously, it has been reported that the chimeric gene originated in primates by insertion of the transposase domain (PF01359, with mutated active site and no transposase activity) in the gene that contained the pre-SET and SET domains . Here we propose that the evolution of this gene involved two crucial steps: retroposition of the sequence coding for the pre-SET and SET domains and the already described insertion of the MAR transposase region . The SET domain has lost the introns present in the original sequence and the pre-SET domain has an intron containing repeat elements in a position not present in the original domain, suggesting it was inserted later on. The likely evolutionary scenario here includes duplication of pre-SET and SET domains through retroposition, insertion of the transposase domain and subsequent joining of these domains. The SETMAR gene is in the intron of another gene (SUMF1), which is on the opposite strand, so it might be that SETMAR is using the other gene's regulatory regions for its transcription. The top of the figure shows the genomic positions of depicted genes. Arrowheads on the lines that represent chromosomal sequences indicate whether the transcripts are coded by the forward or reverse strand. Transcripts are always shown in the 5' to 3' orientation and proteins in the amino- to carboxy-terminal orientation. Exon projections and intron phases are also shown on the protein level. Pfam domains are illustrated as colored boxes. Figure 4b and Additional file 8 use the same conventions. (b) An example of a domain gain by gene duplication followed by exon joining. TreeFam family TF314963 contains genes with a lactate/malate dehydrogenase domain where one branch with vertebrate genes has gained the additional UEV domain. Homologues, both orthologues and paralogues, without the gained domains are present in a number of animal genomes. A representative transcript with the gained domain is UEVLD-205 (ENST00000396197; left-hand side). The UEV domain in that transcript is 56% identical to the UEV domain in the transcript TSG101-201 (ENST00000251968), which belongs to the neighboring gene TSG101, and the two transcripts also have introns with identical phases in the same positions. The likely scenario is that after the gene coding for the TSG101-201 transcript was duplicated, its exons were joined with those of the UEVLD-205 ancestor and the two genes have been fused.
Buljan et al. Genome Biology 2010 11:R74 doi:10.1186/gb-2010-11-7-r74