Evolution and sequence features of Atg8 genes. (a) Atg8 genes in fungi, animals and intermediate-branching species. A schematic tree, based on a sequence-derived phylogenetic tree, showing the three animal Atg8 subfamilies and their presence in key lineages. The subfamilies appear only in animals. Branching between animals and fungi are Atg8 proteins from the few known unicellular species that diverged after the emergence of fungi and before the emergence of multicellular animals. Shown here are Atg8 proteins from the choanoflagellate Monosiga brevicollis (Mbe) , the ichthyosporean Sphaeroforma arctica (Sar) and the amoeba Capsaspora owczarzaki (Cow) . The scheme is based on a tree calculated from protein multiple alignment of Atg8 proteins from representative species with complete and almost complete genomic data. The alignment included 117 conserved amino acid positions. The tree was calculated using the PhyML program version 2.4.4, with 100 bootstrap replicates, four substitution rate categories, the HKY nucleotide substitution model and program-estimated Ts/Tv ratios, gamma shape parameters and invariant proportions as previously described . The subfamily clusters are supported by bootstrap values ranging from 37/100 to 95/100 and also appeared with significant bootstrap values in other trees similarly calculated with different sets of Atg8 genes. The representative species for this scheme were: human, Danio rerio, Xenopus tropicalis, Branchiostoma floridae, Ciona savignyi, Oikopleura dioica, Strongylocentrotus purpuratus, Aplysia californica, Schistosoma mansoni, Schmidtea mediterranea, Drosophila melanogaster, Caenorhabditis elegans, Capitella teleta, Nematostella vectensis, Trichoplax adhaerens, Amphimedon queenslandica, Monosiga brevicollis, Sphaeroforma arctica, Capsaspora owczarzaki, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Allomyces macrogynus, and Tuber melanosporum. (b) Atg8 subfamily sequence features. Sequence logos  show the conservation (overall height) and residue prevalence of multiple alignment positions. The alignment includes the core conserved sequence regions, only excluding short non-conserved distal regions of some sequences. The subfamilies are numbered by the coordinates of the human GATE-16, GABARAP and LC3 proteins. Plus signs indicate similar positions between alignments of the GATE-16 and GABARAP subfamilies and between the GABARAP and LC3 subfamilies. Each of the three families is very well conserved across its entire length (apart from the few amino-terminal residues in LC3). The three families are also very similar to each other in most of their positions. The few positions that are only conserved in each family and different between the subfamilies may account for some of the functional differences between the subfamilies. The alignments and logos were constructed as previously described , taking into account sequence redundancy and expected amino acid frequencies. Sequences for the alignments were taken from the CDD  and PFAM  database entries cd01611 and PF02991, respectively, and sequences similar to ones in these entries, from protein sequences and translated genomic and EST sequences found in public sequence databases.
Shpilka et al. Genome Biology 2011 12:226 doi:10.1186/gb-2011-12-7-226