Complexity of transcribed pseudogenes. Screenshots of pseudogene annotation are taken from the Zmap annotation interface. The pseudogenes are represented as open green boxes and indicated by dark green arrowheads, exons of associated transcript models are represented as filled red boxes and connections are shown by red lines. The coding exons of protein-coding models are represented by dark green boxes and UTR exons as filled red boxes; protein-coding models are also indicated by red arrowheads. (a-c) Single pseudogene models intersecting with single transcript models. (a) The processed pseudogene High mobility group box 1 pseudogene (HMGB1P; HAVANA gene ID: OTTHUMG00000172132 and its associated unspliced (that is, single exon) transcript. (b) The processed pseudogene Myotubularin related protein 12 pseudogene (MTMR12P; HAVANA gene ID: OTTHUMG00000167532) and a spliced transcript model with three exons. (c) A duplicated pseudogene PDZ domain containing 1 pseudogene 1 (PDZK1P1; HAVANA gene ID: OTTHUMG00000013746) and a spliced transcript model with nine exons. (d,e) Single pseudogene models intersecting with multiple transcripts. (d) The processed pseudogene Ribosomal protein, large, P0 pseudogene 1 (RPLP0P1; HAVANA gene ID: OTTHUMG00000158396) and five spliced transcripts. (e) The duplicated pseudogene Family with sequence similarity 86, member A pseudogene (FAM86AP; HAVANA gene ID: OTTHUMG00000159782) and four spliced transcripts. (f,g) Groups of multiple pseudogenes that are connected by overlapping transcripts. (f) Three pseudogenes with single connecting transcripts: 1 is the duplicated pseudogene von Willebrand factor pseudogene 1 (VWFP1; HAVANA gene ID: OTTHUMG00000143725); 2 is a duplicated pseudogene ankyrin repeat domain 62 pseudogene 1 (ANKRD62P1; HAVANA gene ID: OTTHUMG00000149993); 3 is the duplicated pseudogene poly (ADP-ribose) polymerase family, member 4 pseudogene 3 (PARP4P3; HAVANA gene ID: OTTHUMG00000142831). Pseudogene 1 and 2 are connected by a seven exon transcript, pseudogenes 2 and 3 are connected by a nine exon transcript and there is a third transcript that shares two of its four exons with pseudogene 2. (g) Two pseudogenes with multiple connecting transcripts: 1 is the processed pseudogene vitamin K epoxide reductase complex, subunit 1-like 1 pseudogene (VKORC1L1P; HAVANA gene ID: OTTHUMG00000156633); 2 is the duplicated pseudogene chaperonin containing TCP1, subunit 6 (zeta) pseudogene 3 (CCT6P3; HAVANA gene ID: OTTHUMG00000156630). The two pseudogenes are connected by two transcripts that initiate at the upstream pseudogene and utilize a splice donor site within the single exon, which is also a splice donor site in the pseudogene's parent locus. Interestingly, the downstream locus hosts two small nucleolar RNAs (snoRNAs) that are present in the parent locus and another paralog. (h) A very complex case where multiple pseudogenes, connected by multiple transcripts, read through into an adjacent protein-coding locus: 1 is the duplicated pseudogene suppressor of G2 allele of SKP1 (S. cerevisiae) pseudogene (SGT1P; HAVANA gene ID: OTTHUMG00000020323); 2 is a novel duplicated pseudogene (OTTHUMG00000167000); and the protein-coding gene is C9orf174, chromosome 9 open reading frame 174 (OTTHUMG00000167001). (i) A similarly complex case where multiple pseudogenes, connected by multiple transcripts, read through into an adjacent protein-coding locus: 1 is a duplicated pseudogene stromal antigen 3 pseudogene (STAGP3; HAVANA gene ID: OTTHUMG00000156884); 2 is a duplicated pseudogene poliovirus receptor related immunoglobulin domain containing pseudogene (PVRIGP; HAVANA gene ID: OTTHUMG00000156886); and the protein-coding gene is PILRB, paired immunoglobin-like type 2 receptor beta (OTTHUMG00000155363). sRNA, small RNA.
Pei et al. Genome Biology 2012 13:R51 doi:10.1186/gb-2012-13-9-r51