Skip to main content
  • Opinion
  • Published:

Genomic analysis of the eukaryotic protein kinase superfamily: a perspective

Abstract

Protein kinases with a conserved catalytic domain make up one of the largest 'superfamilies' of eukaryotic proteins and play many key roles in biology and disease. Efforts to identify and classify all the members of the eukaryotic protein kinase superfamily have recently culminated in the mining of essentially complete human genome data.

Phosphorylation by protein kinases is recognized as a major mechanism by which virtually every activity of eukaryotic cells is regulated, including proliferation, gene expression, metabolism, motility, membrane transport, and apoptosis. An ultimate goal of research into signal transduction is to reach a full understanding of the protein phosphorylation events that occur within individual cell types and how they eventually impact on cell behavior. A milestone en route to this ambitious goal is a determination of the number of protein kinases encoded by eukaryotic genomes and an assessment of their structures, functions, and evolutionary relationships. This article traces the progress made toward achieving these objectives in the pregenomic and genomic eras, which culminated recently with reports on the 'full complement' of human protein kinases.

The pregenomic era

About sixteen years ago, while working at the Salk Institute, my colleagues and I undertook a comparative analysis of all the available sequences of protein kinase catalytic domains [1]. This interest stemmed from my having identified several novel human protein kinases using a homology-based cDNA cloning strategy [2] and wanting to determine their relationships to other known protein kinases. In collaboration with the Salk's resident protein kinase guru Tony Hunter and biocomputing specialist Anne Marie Quinn, we aligned the homologous catalytic-domain amino-acid sequences of 65 distinct protein kinases from diverse eukaryotes (including 45 nonorthologous vertebrate enzymes) and constructed a phylogenetic tree to visualize their overall relationships [1]. The alignment (produced manually at the word-processor) defined the boundaries of the eukaryotic protein kinase (ePK) catalytic domain, revealed conserved subdomains that were never interrupted by amino-acid insertions, and identified highly conserved individual amino acids and motifs (Figure 1).

Figure 1
figure 1

The ePK catalytic domain. The 12 conserved subdomains are indicated by Roman numerals. The positions of amino-acid residues and motifs highly conserved throughout the ePK superfamily are indicated above the subdomains, using the single-letter amino-acid code with x as any amino acid. Crystal structures show that ePK domains adopt a common fold consisting of amino-terminal and carboxy-terminal lobes connected by a hinge region. Binding of Mg-ATP is largely the function of the amino-terminal lobe and hinge region, while peptide-substrate binding is mediated by the carboxy-terminal lobe. Particularly important for catalytic function are the invariant lysine in subdomain II and the invariant aspartate in subdomain VII that function to anchor and orient ATP, and the invariant aspartate in subdomain VIB which is the likely catalytic base in the phosphotransfer reaction. More detailed discussions of ePK subdomains and conserved residues in relation to crystal structures and catalytic function can be found in [3, 4, 12, 13].

The phylogenetic tree revealed major clusters including the tyrosine kinases (the TK group), cyclic nucleotide- and calcium-phospholipid-dependent kinases (the AGC group; including the PKA, PKG, and PKC families) and calmodulin-dependent kinases (the CAMK group). These groupings indicated that ePK domain phylogeny reflects substrate specificity and/or mode of regulation and could therefore serve as a useful classification tool. Over the next 7 years I continued to add new sequences to the alignment as they became available and to construct phylogenetic trees as a means of classifying the burgeoning ePK superfamily. By early 1994, the ePK domain alignment had grown to contain 390 sequences including 205 non-orthologous vertebrate ePKs, and a fourth major ePK group (CMGC, comprising the CDK, MAPK, GSK, and CLK families) had been added through phylogenetic analysis [3]. The 390 ePK domain alignment was made publicly available through the Protein Kinase Resource website [4].

The genomic era

By 1995, with the advent of genome-sequencing projects, the task of cataloging and classifying the members of the ePK superfamily had grown to become too distracting from my funded research and I discontinued my efforts in this area. Tony Hunter continued to work with bioinformaticians at SUGEN, Inc. (including Greg Plowman, Gerard Manning, and Sucha Sudarsanam) to characterize the full ePK complements of model eukaryotes from genomic sequence data [5, 6]. By the time of a recent report [7], their efforts had resulted in the identification and classification of 115 distinct ePKs from budding yeast (around 2% of all genes), 434 from Caenorhabditis elegans (about 2.5% of all genes), and 223 from Drosophila. In addition they described the complement of 'atypical protein kinases' (aPKs) from these species: 15 from yeast, 20 from C. elegans, and 16 from Drosophila. (The aPKs are a variety of protein kinases that lack strong sequence similarity to the classical ePK domain but have been shown experimentally to have protein kinase activity; well-known examples are the 'lipid kinases' of the phosphatidylinositol 3'-kinase (PI3K) family, some of which have been shown experimentally to have protein kinase activity.)

As a result of their comprehensive analyses of 'kinomes', the SUGEN investigators were able to define three new major groups within the broad ePK classification scheme: first, the STE group, which includes ePKs that function in the MAPK kinase cascades that were first described through characterization of yeast sterile mutants; second, the CK1 group, including the casein kinase 1 family and related enzymes, which is greatly expanded in the worm; and third, the TKL ('tyrosine-kinase like') group that includes the STKR family of TGFbeta serine/threonine kinase receptors and is phylogenetically close to the tyrosine kinases (TKs). Many distinct kinase families within the AGC, CAMK, CMGC, STE, and CK1 groups have representatives from all three species, supporting the idea of an early evolutionary origin and critical function in basic cellular processes. Members of the TK and TKL groups are notably absent from yeast, consistent with the known functions of these ePKs in intercellular signaling events associated with metazoan complexity. More discussion of the evolutionary relationships among the ePKs identified through the SUGEN genome-mining efforts has been published elsewhere [7]. The SUGEN kinase.com website [8] includes links to all their published work on protein kinase analysis as well as 'KinBase', a very useful searchable database that holds information on all the protein kinase genes found in the yeast, worm, fly, and human (see below) genomes.

Human protein kinases

The completion of the first draft of the human genome sequence presented an opportunity to determine the full complement of human protein kinases. The first analysis came from a group led by Mitch Kostich at Schering-Plough Research Institute (SPRI) [9]. This group mined public GenBank records (available before December, 2001) for ePK sequences by performing BLAST searches using known ePK domains as queries. The resulting hits were consolidated, and efforts were made to remove non-human sequences, pseudogenes, and poor-quality sequences that could represent duplicate hits. The SPRI investigators chose to err on the side of inclusion rather than exclusion, however, and many cases of 'single hit' sequences were retained. Their effort culminated in a collection of 510 potentially unique human ePKs. A color-coded alignment that accompanied their article [9] nicely illustrates the ePK domain sequence conservation.

The SUGEN group, led by Gerard Manning and Sucha Sudarsanam, carried out a more comprehensive effort to describe and classify all human ePKs [10]. They employed a dataset that included, in addition to the public databases, genomic reads from Celera that are not publicly available, non-public expressed sequence tags (ESTs) from Incyte and SUGEN, and they searched using a hidden Markov model of the ePK domain that allowed detection of very divergent family members. The sequence data were further searched for members of the various known aPK families. Using stringent criteria to eliminate false positives (including verification of novel sequences by cDNA cloning) they compiled a list of 478 human members of the ePK superfamily and another 40 aPKs, bringing their human kinome total to 518 (approximately 1.7% of all predicted human genes). They also identified 106 ePK or aPK pseudogenes.

A comparison of the SPRI-510 and SUGEN-518 lists reveals 474 protein kinases in common (see the additional file). Of the 44 SUGEN-specific kinases, 32 are aPKs; the other 8 aPKs identified by SUGEN, from the ABC1 and RIO families, were included in the SPRI list as a result of their having weak ePK domain similarity. Of the remaining 12 SUGEN-specific ePKs, five (TAK1, MLKL, NEK5, SgK307, and TBCK) were not available in the public data used in the SPRI analysis; another five (SgK196, SgK223, SgK424, SgK493, and Slob) have rather divergent ePK domains that lack many of the highly conserved residues and are unlikely to have catalytic activity, so it is easy to see how these might have been excluded by visual inspection; and the final two are SgK110 and NEK10. SgK110 was actually detected by the SPRI search, but it was erroneously merged with a related sequence AC008735_EPK1 (SgK069) on the same genomic contig; and it is unclear why the SPRI group missed NEK10. Most, if not all, of the 36 SPRI-specific ePKs represent over-inclusion errors (Table 1): 14 correspond to sequences determined to be pseudogenes by the SUGEN group; 19 are based on single sequences that are (or appear to be) either poor-quality duplicates of other ePKs or interspecies contaminants; and the remaining three are duplicates arising by virtue of non-overlapping partial sequences.

Table 1 Putative ePKs identified by SPRI but not SUGEN

Thus the SUGEN compilation of 478 human ePK superfamily genes represents the accurate count based on current sequence data. If one subtracts those that lack key conserved residues, we are left with 428 human ePKs with known or likely kinase function (Table 2), 99% of which were included in the SPRI list; 365 of these fall within the seven major ePK groups: TK, 84 in total; CAMK, 66; AGC, 61; CMGC, 61; STE, 45; TKL, 37; and CK1, 11. The remaining 63 are in the 'Other' category, falling outside the main ePK group branches. Krupa and Srinivasan [11] have also recently searched the public human genome data with a focus on identifying functional protein kinases; their efforts resulted in a list of 448 distinct human ePK sequences, but around 90 of these appear to represent duplicate entries, and no novel protein kinases were identified that were not present in the SUGEN compilation.

Table 2 The 428 human ePKs with known or likely kinase catalytic function

Usefulness of the kinome data

Knowing the full complement of ePK family members and functional ePKs encoded by eukaryotic genomes will have great impact upon many areas of scientific investigation. As mentioned above, an obvious benefit relates to understanding of how signal transduction pathways evolved during the course of eukaryotic evolution. Both SUGEN [10] and Krupa and Srinivasan [11] extended their analyses to describe other domains present in the various human ePKs which are likely to function in directing the enzymes to relevant substrates or modulating kinase activities. Further analysis of the ePK domain sequences uniquely conserved within the major groups and families, together with comparisons of ePK domain crystal structures, should ultimately allow a full understanding of how different classes of peptide substrate are recognized. For example, Figure 2 shows consensus sequences for the catalytic loop region in subdomain VIB (which includes the invariant aspartate thought to function as the catalytic base) and the activation loop region in subdomain VIII (which includes the highly conserved glutamine in the 'APE' motif) - two regions that have been recognized as being primarily involved in peptide-substrate recognition [12, 13]. A number of group-specific differences are apparent (highlighted in Figure 2) that correlate with unique peptide-recognition tendencies for the ePKs that fall within a given group [14]. Beyond sequence analysis, the kinome data will allow for the development of comprehensive tools (such as full-length cDNAs, microarrays, antibodies, and fusion protein and RNAi constructs) that will greatly aid laboratory investigations aimed at understanding cell signaling through analysis of kinase function. As an example of such proteomic approaches to the study of protein kinases, nearly all yeast protein kinases have been expressed in bacteria and analyzed for their ability to phosphorylate an array of protein or peptide substrates using protein-chip technology [15]. Finally, the human kinome data will have benefits in the understanding and treatment of human diseases. The ePK genes that map within disease loci are attractive etiological candidates, and knowledge of the full repertoire of human protein kinases will greatly aid in the development of drugs that target specific protein kinases or protein kinase families whose function contributes to disease-associated cellular defects.

Figure 2
figure 2

Conserved residues implicated in peptide-substrate recognition. Consensus motifs for the catalytic loop region in subdomain VIB and activation loop region in subdomain VIII were determined for the members of each of the seven major ePK groups with known or likely kinase activity. Invariant residues at a given position are indicated by single upper-case letters. Two upper-case letters at a single position indicate that either of two residues are strictly conserved, the most frequent shown in the top row. Positions in which more than two amino acids are present are indicated with lower-case letters; a single letter indicates that only one residue is highly conserved, two letters indicate that either of two residues are frequently conserved (most frequent on the top row), and 'x' indicating poor positional conservation. Residues highlighted in outline are notably conserved within an ePK group and are thought to function in the recognition of peptide substrates specifically targeted by the members of the group.

Additional data file

An additional data file with the 474 protein kinases in common between the SPRI-510 and the SUGEN-518 lists is available.

References

  1. Hanks SK, Quinn AM, Hunter T: The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science. 1988, 241: 42-52.

    Article  PubMed  CAS  Google Scholar 

  2. Hanks SK: Homology probing: identification of cDNA clones encoding members of the protein-serine kinase family. Proc Natl Acad Sci USA. 1987, 84: 388-392.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  3. Hanks SK, Hunter T: The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. FASEB J. 1995, 9: 576-596.

    PubMed  CAS  Google Scholar 

  4. Protein Kinase Resource. [http://kinases.sdsc.edu/html/index.shtml]

  5. Hunter T, Plowman GD: The protein kinases of budding yeast: six score and more. Trends Biochem Sci. 1997, 22: 18-22. 10.1016/S0968-0004(96)10068-2.

    Article  PubMed  CAS  Google Scholar 

  6. Plowman GD, Sudarsanam S, Bingham J, Whyte D, Hunter T: The protein kinases of Caenorhabditis elegans: a model for signal transduction in multicellular organisms. Proc Natl Acad Sci USA. 1999, 96: 13603-13610. 10.1073/pnas.96.24.13603.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  7. Manning G, Plowman GD, Hunter T, Sudarsanam S: Evolution of protein kinase signaling from yeast to man. Trends Biochem Sci. 2002, 27: 514-520. 10.1016/S0968-0004(02)02179-5.

    Article  PubMed  CAS  Google Scholar 

  8. Kinase.com. [http://kinase.com/]

  9. Kostich M, English J, Madison V, Gheyas F, Wang L, Qiu P, Greene J, Laz TM: Human members of the eukaryotic protein kinase family. Genome Biol. 2002, 3: research0043.1-12. 10.1186/gb-2002-3-9-research0043.

    Article  Google Scholar 

  10. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S: The protein kinase complement of the human genome. Science. 2002, 298: 1912-1934. 10.1126/science.1075762.

    Article  PubMed  CAS  Google Scholar 

  11. Krupa A, Srinivasan N: The repertoire of protein kinases encoded in the draft version of the human genome: atypical variations and uncommon domain combinations. Genome Biol. 2002, 3: research0066.1-0066.14. 10.1186/gb-2002-3-12-research0066.

    Article  Google Scholar 

  12. Taylor SS, Radzio-Andzelm E, Hunter T: How do protein kinases discriminate between serine/threonine and tyrosine? Structural insights from the insulin receptor protein-tyrosine kinase. FASEB J. 1995, 9: 1255-1266.

    PubMed  CAS  Google Scholar 

  13. Johnson LN, Lowe ED, Noble MEM, Owen DJ: The structural basis for substrate recognition and control by protein kinases. FEBS Lett. 1998, 430: 1-11. 10.1016/S0014-5793(98)00606-1.

    Article  PubMed  CAS  Google Scholar 

  14. Kreegipuu A, Blom N, Brunak S, Järv J: Statistical analysis of protein kinase specificity determinants. FEBS Lett. 1998, 430: 45-50. 10.1016/S0014-5793(98)00503-1.

    Article  PubMed  CAS  Google Scholar 

  15. Zhu H, Klemic JF, Chang S, Bertone P, Casamayor A, Klemic KG, Smith D, Gerstein M, Reed MA, Snyder M: Analysis of yeast protein kinases using protein chips. Nat Genet. 2000, 26: 283-289. 10.1038/81576.

    Article  PubMed  CAS  Google Scholar 

  16. Janji B, Melchior C, Vallar L, Kieffer NL: Cloning of an isoform of integrin-linked kinase (ILK) that is upregulated in HT-144 melanoma cells following TGF-beta1 stimulation. Oncogene. 2000, 19: 3069-3077. 10.1038/sj.onc.1203640.

    Article  PubMed  CAS  Google Scholar 

Editors' note

  • The author has declared that he has no affiliation with SUGEN or Schering-Plough.

Download references

Acknowledgements

I am indebted to Gerard Manning of SUGEN, Mitch Kostich of Schering-Plough Research Institute, and N. Srinivasan of the Indian Institute of Science, for their contributions and comments regarding comparative analysis of their respective human protein kinase compilations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steven K Hanks.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hanks, S.K. Genomic analysis of the eukaryotic protein kinase superfamily: a perspective. Genome Biol 4, 111 (2003). https://doi.org/10.1186/gb-2003-4-5-111

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/gb-2003-4-5-111

Keywords