A flow chart of our computational pipeline for identifying pseudogenes. It contains two parallel procedures, one on the left (routine P) is mainly for processed pseudogenes and the other on the right (routine D) is for duplicated pseudogenes. The steps common to both are shown at the top and in the bottom. Both procedures searched the ENCODE regions for DNA sequences similar to human genes as annotated by the ENSEMBL. The two routines differ in how to perform the search and how to process the search results. The key differences are highlighted with blue in P and orange in D. At the end, an alignment between a known gene and a pseudogene candidate was constructed either by TFASTY or GeneWise. Information in this alignment and the computational path taken by a pseudogene were used together to separate pseudogenes into three classes: duplicated, processed and fragment.
Zheng and Gerstein Genome Biology 2006 7(Suppl 1):S13 doi:10.1186/gb-2006-7-s1-s13