<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2003-5-1-r6</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Method</dochead>
		<bibl>
			<title>
				<p>Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Brun</snm>
					<fnm>Christine</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A2" ce="yes">
					<snm>Chevenet</snm>
					<fnm>Fran&#231;ois</fnm>
					<insr iid="I2"/>
				</au>
				<au id="A3" ce="yes">
					<snm>Martin</snm>
					<fnm>David</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A4">
					<snm>Wojcik</snm>
					<fnm>J&#233;r&#244;me</fnm>
					<insr iid="I3"/>
				</au>
				<au id="A5">
					<snm>Gu&#233;noche</snm>
					<fnm>Alain</fnm>
					<insr iid="I4"/>
				</au>
				<au id="A6" ca="yes">
					<snm>Jacq</snm>
					<fnm>Bernard</fnm>
					<insr iid="I1"/>
					<email>jacq@lgpd.univ-mrs.fr</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Laboratoire de G&#233;n&#233;tique et Physiologie du D&#233;veloppement, CNRS UMR6545, Parc Scientifique et Technologique de Luminy, Case 907, 13288 Marseille Cedex 9, France</p>
				</ins>
				<ins id="I2">
					<p>Centre d'Etude sur le Polymorphisme des Micro-organismes, CNRS/IRD UMR 9926, 911 avenue Agropolis, BP 6450, 34394 Montpellier Cedex 5, France</p>
				</ins>
				<ins id="I3">
					<p>Hybrigenics SA, 3/5 impasse Reille, 75014 Paris, France</p>
				</ins>
				<ins id="I4">
					<p>Institut de Math&#233;matiques de Luminy, CNRS UPR9016, Parc Scientifique et Technologique de Luminy, Case 907, 13288 Marseille Cedex 9, France</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2003</pubdate>
			<volume>5</volume>
			<issue>1</issue>
			<fpage>R6</fpage>
			<url>http://genomebiology.com/2003/5/1/R6</url>
			<xrefbib>
				<pubid idtype="pmpid">14709178</pubid>
			</xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>25</day>
					<month>6</month>
					<year>2003</year>
				</date>
			</rec>
			<revrec>
				<date>
					<day>6</day>
					<month>10</month>
					<year>2003</year>
				</date>
			</revrec>
			<acc>
				<date>
					<day>14</day>
					<month>11</month>
					<year>2003</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>15</day>
					<month>12</month>
					<year>2003</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2003</year>
			<collab>Brun et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
		</cpyrt>
		<shorttitle>
			<p>Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network</p>
		</shorttitle>
		<shortabs>
			<p>PRODISTIN, a new computational method allowing the functional clustering of proteins on the basis of protein-protein interaction data. This method was used to classify 11% of the <it>Saccharomyces cerevisiae </it>proteome into several groups.</p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<p>We here describe PRODISTIN, a new computational method allowing the functional clustering of proteins on the basis of protein-protein interaction data. This method, assessed biologically and statistically, enabled us to classify 11% of the <it>Saccharomyces cerevisiae </it>proteome into several groups, the majority of which contained proteins involved in the same biological process(es), and to predict a cellular function for many otherwise uncharacterized proteins.</p>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010013">Methods</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Complete genome sequencing makes available a large number of coding protein sequences for which we have little or no functional information. In fact, the function of 30-35% of encoded proteins per completely sequenced genome remains unknown <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. To decipher the functions of these proteins and, more broadly, to propose functional relationships among proteins, new computational methods relying upon genome organization have been developed. The Rosetta Stone method proposes that two proteins in a given proteome are functionally linked when they exist as a single fused polypeptide in another proteome <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. The chromosomal proximity method suggests that genes repeatedly found as neighbors on chromosomes in different organisms may encode functionally related proteins <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. Finally, the phylogenetic co-inheritance of proteins in several different proteomes may indicate their functional link <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Although these methods and combinations thereof <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> successfully predict the function of certain proteins, they suffer from several limitations: they are more informative when applied to completely sequenced genomes; they are generally more appropriate for prokaryotic genome organization; and the principles underlying some of them are only valid for a small number of proteins.</p>
			<p>Molecular interactions are essential actors for all biological processes. Large-scale studies of protein-protein interactions have been carried out in several organisms to establish interaction maps and to decipher protein function <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. These large intricate networks now need to be analyzed in detail to extract information related to protein function and to relationships linking cellular processes. Various methods of biological network analysis have been proposed so far. They may, for instance, allow identification of functional modules after network clustering <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, or the assignment of function to proteins of unknown function on the basis of the functional annotation of their neighbors <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Another way to analyze the interaction network is to compare proteins functionally at the cellular level. This approach would represent a useful complement to sequence-comparison methods, which address function at the molecular level. With this in mind, we propose a new bioinformatics method allowing a functional classification of the proteins according to the identity of their interacting partners.</p>
			<p>The method, named PRODISTIN for protein distance based on interactions, was applied to the yeast interactome and statistically evaluated for robustness using several independent criteria. The analysis of the results obtained demonstrated that proteins are grouped according to their cellular rather than molecular function; proteins involved in the same molecular complex(es), pathway(s) or cellular process(es) are clustered; a sound prediction of cellular function for the uncharacterized proteins is possible. The biological relevance of the obtained predictions is discussed with respect to recent experimental results.</p>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<sec>
				<st>
					<p>Principle of the PRODISTIN method and classification of the yeast proteome</p>
				</st>
				<p>We previously suggested that comparing the sets of interactors for different proteins should allow detection of functional similarity independently of the sequence information <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. We therefore developed the PRODISTIN method based on the principle that the more two proteins share common interactors, the more likely they are to be functionally related. In practice, starting from a list of binary protein-protein interactions, the PRODISTIN method consists of three different and successive bioinformatic steps (Figure <figr fid="F1">1</figr>, see Materials and methods for details). First, a graph comprising all proteins connected by a specific relation is constructed and a functional distance is calculated between all possible pairs of proteins in the graph with regard to the number of interactors they share. Second, all distance values are clustered, leading to a classification tree. Third, the tree is visualized and subdivided into formal classes. We thus define a PRODISTIN class as the largest possible subtree composed of at least three proteins sharing the same functional annotation and representing at least 50% (the absolute majority) of the individual class members for which a functional annotation is available. Classes of proteins are then analyzed for their biological relevance and tested for their statistical robustness.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Flowchart of PRODISTIN</p>
					</caption>
					<text>
						<p>Flowchart of PRODISTIN. <b>(a) </b>A graph is constructed from a list of binary protein-protein interactions. <b>(b) </b>A functional distance based on the identity of the shared interactors is calculated among all proteins. <b>(c) </b>The distance matrix obtained is used to build a classification tree, on which functional classes are subsequently determined and analyzed by evaluating <b>(d) </b>their statistical robustness and <b>(e) </b>their biological relevance.</p>
					</text>
					<graphic file="gb-2003-5-1-r6-1"/>
				</fig>
				<p>In the first experiment, we analyzed 2,946 yeast protein-protein interactions involving 2,139 proteins, that is, 38% of the <it>Saccharomyces cerevisiae </it>proteome <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. The classification tree obtained contains 602 proteins (Figure <figr fid="F2">2</figr>).</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>A functional classification tree for 602 yeast proteins computed with the PRODISTIN method</p>
					</caption>
					<text>
						<p>A functional classification tree for 602 yeast proteins computed with the PRODISTIN method. <b>(a) </b>The foundation for protein clustering. PRODISTIN classes are clustered according to the 'cellular role' of proteins only (pink), according to the 'functional category' of proteins only (blue), and according to both criteria (yellow). <b>(b) </b>Functional classification. PRODISTIN classes on the circular classification tree have been colored according to their corresponding 'cellular role'. Protein names have been omitted for clarity (see Additional data file 1 for details of the classes). Classes corresponding to two different 'cellular roles' are colored according to the first annotation used in Additional data file 1.</p>
					</text>
					<graphic file="gb-2003-5-1-r6-2"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>PRODISTIN clustering depends neither on sequence similarity nor on biochemical function</p>
				</st>
				<p>To understand the biological foundation of PRODISTIN clustering, we examined different possibilities that could explain protein segregation in the tree. First, we tested whether sequence similarity correlates with our clustering results, given the abundance of proteins involved in related functions that exhibit similarity in their sequences. Pairwise alignments between the sequences of the 602 yeast proteins classified by PRODISTIN were computed using a global and a local alignment algorithm. Given that the obtained distances (expressed as the percentage of similarity for global and the score for local alignments, respectively) do not fit with tree distances, the tree model is not appropriate to represent these huge alignments <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. We thus directly compared the distance values obtained with PRODISTIN, the global and the local alignments (as described above), by identifying for each distance matrix the nonredundant pairs of proteins (x, y) for which y is the closest neighbour of x or vice versa.</p>
				<p>Among the 611 closest pairs of proteins identified with PRODISTIN, the 546 obtained with the global and the 527 obtained with the local alignment, 112 are shared between both alignments (21.2%), 32 between PRODISTIN and the global alignment (5.8%) and 38 between PRODISTIN and the local alignment (7.2%). This result strongly suggests that sequence alignments do not cluster the same proteins that PRODISTIN does, leading to the conclusion that PRODISTIN clustering is only moderately dependent on sequence similarity.</p>
				<p>As sequence similarity is not a key determinant of PRODISTIN clustering, we then investigated the capacity of PRODISTIN to cluster proteins with identical or related functions. To do so, we separately analyzed PRODISTIN classes using two types of protein functional annotations described in the Yeast Proteome Database (YPD) <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>: the 'functional category' corresponding to the biochemical function(s) and the 'cellular role' describing the cellular function(s) (see <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B24">24</abbr></abbrgrp> for discussions about the notion of function). Both types of function are known for 420 proteins in the tree. For comparison, PRODISTIN classes were separately constructed as defined above according to either the cellular or the biochemical function of proteins, using the 420/602 proteins annotated for both types of function (Figure <figr fid="F2">2a</figr>). Among the total of 369 proteins belonging to PRODISTIN classes, 212 (57%) are clustered according to both types of function, and 157 (43%) according to only one type of function. Strikingly, 69% of the latter (108/157) are clustered according to the cellular function whereas the remaining 31% (49/157) are grouped according to the biochemical function. Therefore, the PRODISTIN method clusters proteins more efficiently by their cellular function than by their biochemical function. This result is further validated by the following observations. First, when the subcellular localization of the classified proteins is investigated, proteins belonging to the same subcellular compartment are found clustered in the tree, as would be expected from clustering based on cellular function (data not shown). Second, when the biochemical function of proteins is considered, proteins with functions such as 'protein kinase' or 'hydrolase' are found broadly scattered in the tree. Given that proteins with such biochemical functions are likely to be involved in a large number of different cellular processes, their scattering throughout the tree is to be expected from clustering on the basis of the cellular function. Third, sequence-similarity classification of proteins differs from PRODISTIN protein clustering, as described above. Consequently, from now on, we will only consider PRODISTIN classes based on the cellular function of proteins.</p>
			</sec>
			<sec>
				<st>
					<p>Classification of the <it>S. cerevisiae </it>proteome: integrated analysis of cellular processes and their cross-talk</p>
				</st>
				<p>Using the 509 yeast proteins of the tree annotated in YPD for 'cellular role', 64 different PRODISTIN classes were constructed, containing 3 to 36 members each. They contain two-thirds (408/602) of the tree proteins and cover 29 different 'cellular roles' out of 44 possible (Figure <figr fid="F2">2b</figr>; see also Additional data file 1). Whereas some 'cellular roles' are associated with only one class in the tree (such as 'meiosis', which is class 27 (Figure <figr fid="F2">2b</figr>, see also Additional data file 1)), several classes have the same cellular role. This generally corresponds to different aspects of a given cellular process: for instance, the six classes accounting for 'vesicular transport' (Figure <figr fid="F2">2b</figr>) are specifically devoted to autophagy (class 45), structural proteins related to actin (class 55), endoplasmic reticulum to Golgi transport (classes 56, 57), endocytosis (class 58) and exocytosis (class 59), respectively (see Additional data file 1).</p>
				<p>A detailed analysis of the PRODISTIN classes shows that several types of classes are encountered when class functional homogeneity is considered. In the simplest case, proteins are associated with the same molecular complex or involved in a particular cellular process. Nearly half of the classes fall into this category; for instance, class 23 (Figure <figr fid="F3">3a</figr>) consists solely of five members of the peroxisomal import complex <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, and class 22 'DNA synthesis' (Figure <figr fid="F3">3d</figr>; see also Additional data file 1) contains 9 out of 12 proteins involved in DNA replication (labelled with an asterisk on Figure <figr fid="F3">3d</figr>). The two other characterized proteins belonging to this class are implicated in related and/or overlapping processes such as 'cell cycle control' and 'chromatin and chromosome structure' (Cdc23 and Spt2, respectively).</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Examples of PRODISTIN classes</p>
					</caption>
					<text>
						<p>Examples of PRODISTIN classes. <b>(a) </b>Class 21 'lipid and fatty acid metabolism/protein translocation'. <b>(b) </b>Class 20 'DNA synthesis'. <b>(c) </b>Class 50 'RNA processing/modification'. Asterisks indicate founder proteins of the class (that is, annotated in YPD with the 'cellular role' given to the class). Computed class robustness indexes (CRIs) are shown in front of nodes.</p>
					</text>
					<graphic file="gb-2003-5-1-r6-3"/>
				</fig>
				<p>The second case corresponds to classes annotated with two different cellular roles. These classes either cluster multifunctional proteins that are doubly annotated (all the peroxisomal proteins forming class 23 are involved in 'lipid fatty acid and sterol metabolism' as well as in 'protein translocation' (Table <tblr tid="T1">1</tblr>; see also Additional data file 1) or contain at least 50% of the proteins annotated for a cellular role, at least 50% annotated for another cellular function, and certain proteins annotated for dual functions (Table <tblr tid="T1">1</tblr>). For instance, three out of six proteins in class 17 'chromatin and chromosome structure/mitosis' (Figure <figr fid="F3">3b</figr>) are associated with the kinetochore (Dam1, Spc19 and Spc34, annotated 'chromatin and chromosome structure'), and five play a part in the maintenance of the spindle-pole body (Dam1, Spc19, Dad2, Dad1 and Duo1, annotated 'mitosis'), with two proteins involved in both processes (Dam1 and Spc19). Such situations illustrate cross-talk between superimposed or partially overlapping cellular processes, via the dual function of some proteins.</p>
				<tbl id="T1" hint_layout="double">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Cross-talk between cellular processes after PRODISTIN classification</p>
					</caption>
					<tblbdy cols="2">
						<r>
							<c ca="left">
								<p>Cellular processes</p>
							</c>
							<c ca="left">
								<p>PRODISTIN classes</p>
							</c>
						</r>
						<r>
							<c cspan="2">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<b>Superimposed cellular processes</b>
								</p>
							</c>
							<c ca="left">
								<p>PRODISTIN classes composed of doubly annotated proteins</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cell stress <graphic file="gb-2003-5-1-r6-i1.gif"/> other metabolism</p>
							</c>
							<c ca="left">
								<p>10</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cell structure <graphic file="gb-2003-5-1-r6-i1.gif"/> protein folding</p>
							</c>
							<c ca="left">
								<p>14</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Lipid fatty acid metabolism <graphic file="gb-2003-5-1-r6-i1.gif"/> protein translocation</p>
							</c>
							<c ca="left">
								<p>23</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>PolII transcription <graphic file="gb-2003-5-1-r6-i1.gif"/> protein degradation</p>
							</c>
							<c ca="left">
								<p>34</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>RNA processing and modification <graphic file="gb-2003-5-1-r6-i1.gif"/> RNA splicing</p>
							</c>
							<c ca="left">
								<p>50</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<b>Partially overlapping cellular processes</b>
								</p>
							</c>
							<c ca="left">
								<p>PRODISTIN classes composed of at least three proteins annotated for a cellular role, three proteins annotated for another one, with some doubly annotated</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cell polarity <graphic file="gb-2003-5-1-r6-i1.gif"/> cell structure</p>
							</c>
							<c ca="left">
								<p>7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cell polarity <graphic file="gb-2003-5-1-r6-i1.gif"/> mating response</p>
							</c>
							<c ca="left">
								<p>9</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cell Structure <graphic file="gb-2003-5-1-r6-i1.gif"/> protein complex assembly</p>
							</c>
							<c ca="left">
								<p>13</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Chromosome and chromatin structure <graphic file="gb-2003-5-1-r6-i1.gif"/> mitosis</p>
							</c>
							<c ca="left">
								<p>17</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Mating response <graphic file="gb-2003-5-1-r6-i1.gif"/> differentiation</p>
							</c>
							<c ca="left">
								<p>24</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Protein degradation <graphic file="gb-2003-5-1-r6-i1.gif"/> vesicular transport</p>
							</c>
							<c ca="left">
								<p>45</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<b>Nested cellular processes</b>
								</p>
							</c>
							<c ca="left">
								<p>Nested PRODISTIN classes</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Aging &#8834; Signal transduction</p>
							</c>
							<c ca="left">
								<p>0 &#8834; 54</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cell cycle control &#8834; Amino acid metabolism</p>
							</c>
							<c ca="left">
								<p>3 &#8834; 1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cytokinesis &#8834; Cell polarity</p>
							</c>
							<c ca="left">
								<p>20 &#8834; 8, 21 &#8834; 8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Mating response &#8834; Cell polarity</p>
							</c>
							<c ca="left">
								<p>25 &#8834; 8, 26 &#8834; 8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cell polarity/Mating response &#8834; Signal transduction</p>
							</c>
							<c ca="left">
								<p>9 &#8834; 54</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cell stress &#8834; Protein degradation/Vesicular transport</p>
							</c>
							<c ca="left">
								<p>11 &#8834; 45</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cell stress &#8834; Signal transduction</p>
							</c>
							<c ca="left">
								<p>12 &#8834; 54</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cell structure/Protein complex assembly &#8834; Mitosis</p>
							</c>
							<c ca="left">
								<p>13 &#8834; 28</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Chromatin/Chromosome structure &#8834; PolII transcription</p>
							</c>
							<c ca="left">
								<p>16 &#8834; 35</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Mating response/Differentiation &#8834; Signal transduction</p>
							</c>
							<c ca="left">
								<p>24 &#8834; 54</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>PolIII transcription &#8834; PolII transcription</p>
							</c>
							<c ca="left">
								<p>42 &#8834; 39</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>RNA processing and modification &#8834; Nucleus-cytoplasm transport</p>
							</c>
							<c ca="left">
								<p>51 &#8834; 31</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>RNA splicing &#8834; RNA processing/modification</p>
							</c>
							<c ca="left">
								<p>53 &#8834; 52</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Vesicular transport &#8834; Cell polarity/cell structure</p>
							</c>
							<c ca="left">
								<p>55 &#8834; 7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Vesicular transport &#8834; Cell polarity</p>
							</c>
							<c ca="left">
								<p>59 &#8834; 8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Unknown &#8834; Cell structure/protein folding</p>
							</c>
							<c ca="left">
								<p>60 &#8834; 14</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Unknown &#8834; Vesicular transport</p>
							</c>
							<c ca="left">
								<p>62 &#8834; 56</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<p>Finally, a third case is encountered, in which small classes are nested within larger classes (Table <tblr tid="T1">1</tblr>) representing another example of cross-talk between cellular processes. The example given is for class 1 'amino acid metabolism' (Figure <figr fid="F3">3c</figr>; see also Additional data file 1). The metabolism of amino acids is related to cell-cycle control (class 3, Figure <figr fid="F3">3c</figr>) through the ubiquitin-dependent proteolysis pathway mediated by the ubiquitin protein ligase complex SCF (Skp1-Cdc53-F-box protein). This complex contains two core proteins - Skp1 and Cdc53 - and a F-box motif-containing protein required for the specific targeting of certain proteins to the degradation pathway <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Consequently, a 'cell cycle control' class containing Skp1, Cdc53 and the F-box protein Cdc4, which targets Sic1 to degradation at the G1-S transition of the cell cycle, is nested within an 'amino acid metabolism' class enclosing the F-box protein Met30, which targets the transcription activator Met4 towards degradation during methionine biosynthesis. It is interesting to note that these classes encompass the uncharacterized F-box-containing protein Flm1 which, on the basis of its position in the classification tree (Figure <figr fid="F3">3c</figr>), is a candidate to target Csm3, a protein needed for chromosome segregation at meiosis <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, towards the ubiquitin-dependent proteolysis pathway.</p>
				<p>The detailed analysis of the classes shows that the PRODISTIN method clusters proteins belonging to the same molecular complex, pathway or cellular process, and underlines cross-talk between functions. Therefore, the method enables the extraction of complex functional information from interaction networks by considerably reducing their complexity.</p>
			</sec>
			<sec>
				<st>
					<p>Functional predictions and their biological relevance</p>
				</st>
				<p>Among the 602 tree proteins, 93 had no defined 'cellular role' in YPD when we retrieved annotations (see Materials and methods). As 42 of them belong to a defined PRODISTIN class, a cellular function could consequently be proposed. Our predictions (Table <tblr tid="T2">2</tblr>; see also Additional data file 2) were compared with predictions obtained by others using several bioinformatics methods <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B18">18</abbr><abbr bid="B28">28</abbr></abbrgrp>, the association of the protein to a complex of known functions <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> and recent experimental results described in the literature and reported in the <it>Saccharomyces </it>Genome Database (SGD) <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
				<tbl id="T2" hint_layout="double">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>Functional predictions and comparisons with predictions obtained by other means</p>
					</caption>
					<tblbdy cols="7">
						<r>
							<c ca="left">
								<p>Protein name</p>
							</c>
							<c ca="center">
								<p>Class</p>
							</c>
							<c ca="left">
								<p>Predicted function (this study)</p>
							</c>
							<c ca="center">
								<p>Prediction after <abbrgrp><abbr bid="B8">8</abbr></abbrgrp></p>
							</c>
							<c ca="center">
								<p>Prediction after <abbrgrp><abbr bid="B28">28</abbr></abbrgrp></p>
							</c>
							<c ca="center">
								<p>Prediction after <abbrgrp><abbr bid="B18">18</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>GO annotations, September 2003 <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> and predictions after <abbrgrp><abbr bid="B29">29</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c cspan="7">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>FLM1</p>
							</c>
							<c ca="center">
								<p>1, 3</p>
							</c>
							<c ca="left">
								<p>Amino acid metabolism, cell cycle control (0)</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776; (0)</p>
							</c>
							<c ca="center">
								<p>&#8800; (0)</p>
							</c>
							<c ca="left">
								<p>Mitochondrion organization and biogenesis</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>VTS1</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="left">
								<p>Cell cycle control (0)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776; (0)</p>
							</c>
							<c ca="left">
								<p>Protein-vacuolar targeting</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YPR171W</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="left">
								<p>Cell polarity (1)</p>
							</c>
							<c ca="center">
								<p>&#8776; (1)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Cell polarity and structure, actin cytoskeleton organization and biogenesis</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YBR108W</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="left">
								<p>Cell polarity</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YGR268C</p>
							</c>
							<c ca="center">
								<p>7, 55</p>
							</c>
							<c ca="left">
								<p>Cell polarity, cell structure, vesicular transport</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>DSE1</p>
							</c>
							<c ca="center">
								<p>8, 25</p>
							</c>
							<c ca="left">
								<p>Cell polarity, mating response (1)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Cell wall organization and biogenesis</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YKL082C</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="left">
								<p>Cell polarity</p>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YMR322C</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="left">
								<p>Cell stress, other metabolism</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>VPS64</p>
							</c>
							<c ca="center">
								<p>14, 60</p>
							</c>
							<c ca="left">
								<p>Cell structure, protein folding (1)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800; (1)</p>
							</c>
							<c ca="left">
								<p>Protein-vacuolar targeting, cell cycle arrest in response to pheromone</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YFR008W</p>
							</c>
							<c ca="center">
								<p>14, 60</p>
							</c>
							<c ca="left">
								<p>Cell structure, protein folding (0)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800; (1)</p>
							</c>
							<c ca="left">
								<p>Cell cycle arrest in response to pheromone</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YNL127W</p>
							</c>
							<c ca="center">
								<p>14, 60</p>
							</c>
							<c ca="left">
								<p>Cell structure, protein folding (0)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776; (1)</p>
							</c>
							<c ca="left">
								<p>Cell cycle arrest in response to pheromone</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YJL019W</p>
							</c>
							<c ca="center">
								<p>22</p>
							</c>
							<c ca="left">
								<p>DNA synthesis (1)</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776; (1)</p>
							</c>
							<c ca="center">
								<p>&#8776; (1)</p>
							</c>
							<c ca="left">
								<p>Spindle pole duplication</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>PST2</p>
							</c>
							<c ca="center">
								<p>24, 54</p>
							</c>
							<c ca="left">
								<p>Mating response, differentiation, signal transduction</p>
							</c>
							<c ca="center">
								<p>&#8800;</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YLL049W</p>
							</c>
							<c ca="center">
								<p>29</p>
							</c>
							<c ca="left">
								<p>Mitosis</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YNR069C</p>
							</c>
							<c ca="center">
								<p>29</p>
							</c>
							<c ca="left">
								<p>Mitosis</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>NIS1</p>
							</c>
							<c ca="center">
								<p>30</p>
							</c>
							<c ca="left">
								<p>Nucleus-cytoplasm transport (0)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800; (0)</p>
							</c>
							<c ca="left">
								<p>Regulation of mitosis</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YKL061W</p>
							</c>
							<c ca="center">
								<p>30</p>
							</c>
							<c ca="left">
								<p>Nucleus-cytoplasm transport</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YDR489W</p>
							</c>
							<c ca="center">
								<p>30</p>
							</c>
							<c ca="left">
								<p>Nucleus-cytoplasm transport (0)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800; (0)</p>
							</c>
							<c ca="left">
								<p>DNA-dependent DNA replication</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YHL018W</p>
							</c>
							<c ca="center">
								<p>33</p>
							</c>
							<c ca="left">
								<p>PolI transcription</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YDR179C</p>
							</c>
							<c ca="center">
								<p>35</p>
							</c>
							<c ca="left">
								<p>PolII transcription (1)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800; (0)</p>
							</c>
							<c ca="left">
								<p>Protein synthesis turnover, protein deneddylation</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YMR025W</p>
							</c>
							<c ca="center">
								<p>35</p>
							</c>
							<c ca="left">
								<p>PolII transcription (1)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800; (0)</p>
							</c>
							<c ca="left">
								<p>Protein synthesis turnover, protein deneddylation</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YJL058C</p>
							</c>
							<c ca="center">
								<p>36</p>
							</c>
							<c ca="left">
								<p>PolII transcription</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>SOH1</p>
							</c>
							<c ca="center">
								<p>37</p>
							</c>
							<c ca="left">
								<p>PolII transcription (1)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Transcription from polII promoter, DNA repair</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YJR083C</p>
							</c>
							<c ca="center">
								<p>37</p>
							</c>
							<c ca="left">
								<p>PolII transcription</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YGL230C</p>
							</c>
							<c ca="center">
								<p>38</p>
							</c>
							<c ca="left">
								<p>PolII transcription</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>VAC14</p>
							</c>
							<c ca="center">
								<p>43</p>
							</c>
							<c ca="left">
								<p>Protein degradation (0)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800; (0)</p>
							</c>
							<c ca="left">
								<p>Intermediate and energy metabolism, transcription, DNA maintenance, chromatin structure, phospholipid metabolism, vacuole inheritance</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>AKL1</p>
							</c>
							<c ca="center">
								<p>43</p>
							</c>
							<c ca="left">
								<p>Protein degradation</p>
							</c>
							<c ca="center">
								<p>&#8800;</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YHR115C</p>
							</c>
							<c ca="center">
								<p>43</p>
							</c>
							<c ca="left">
								<p>Protein degradation</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YPL105C</p>
							</c>
							<c ca="center">
								<p>48</p>
							</c>
							<c ca="left">
								<p>Protein synthesis</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800;</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YLR424W</p>
							</c>
							<c ca="center">
								<p>49</p>
							</c>
							<c ca="left">
								<p>RNA processing and modification</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YKR022C</p>
							</c>
							<c ca="center">
								<p>49</p>
							</c>
							<c ca="left">
								<p>RNA processing and modification</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>AIR2</p>
							</c>
							<c ca="center">
								<p>52</p>
							</c>
							<c ca="left">
								<p>RNA processing and modification (1)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>RNA metabolism, mRNA nucleus export</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>DHH1</p>
							</c>
							<c ca="center">
								<p>52</p>
							</c>
							<c ca="left">
								<p>RNA processing and modification (1)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Deadenylation-dependent decapping, NOT mRNA catabolism, nonsense mediated</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YEL015W</p>
							</c>
							<c ca="center">
								<p>52</p>
							</c>
							<c ca="left">
								<p>RNA processing and modification (1)</p>
							</c>
							<c ca="center">
								<p>&#8776; (1)</p>
							</c>
							<c ca="center">
								<p>= (1)</p>
							</c>
							<c ca="center">
								<p>&#8800; (0)</p>
							</c>
							<c ca="left">
								<p>RNA metabolism</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YOR285W</p>
							</c>
							<c ca="center">
								<p>54</p>
							</c>
							<c ca="left">
								<p>Signal transduction</p>
							</c>
							<c ca="center">
								<p>&#8800;</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YGL161C</p>
							</c>
							<c ca="center">
								<p>56</p>
							</c>
							<c ca="left">
								<p>Vesicular transport</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YDR100W</p>
							</c>
							<c ca="center">
								<p>56</p>
							</c>
							<c ca="left">
								<p>Vesicular transport</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YDR425W</p>
							</c>
							<c ca="center">
								<p>56</p>
							</c>
							<c ca="left">
								<p>Vesicular transport (1)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Protein, transport</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YDR084C</p>
							</c>
							<c ca="center">
								<p>56</p>
							</c>
							<c ca="left">
								<p>Vesicular transport</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YGL198W</p>
							</c>
							<c ca="center">
								<p>56</p>
							</c>
							<c ca="left">
								<p>Vesicular transport</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YPL246C</p>
							</c>
							<c ca="center">
								<p>56</p>
							</c>
							<c ca="left">
								<p>Vesicular transport</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8776;</p>
							</c>
							<c ca="left">
								<p>Unknown</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>YLR285W</p>
							</c>
							<c ca="center">
								<p>57</p>
							</c>
							<c ca="left">
								<p>Vesicular transport (0)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&#8800; (0)</p>
							</c>
							<c ca="left">
								<p>Chromatin silencing at ribosomal DNA, nicotinamide metabolism</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>=, &#8776;, &#8800;, are used to indicate when prediction from other bioinformatic methods are the same, almost the same, or different from PRODISTIN predictions. The number in parentheses indicates when the prediction is in accordance or related to (1), or different (0) from functions demonstrated experimentally.</p>
					</tblfn>
				</tbl>
				<p>For two proteins (5%), no cellular function has ever been proposed by any other method. For 27 proteins (64%), our prediction is in accordance with or related to previously proposed ones, or the experimental results. For 13 proteins (30%), our predictions disagree (Table <tblr tid="T2">2</tblr>; see also Additional data file 2). When only the 19 experimentally determined functions are considered, PRODISTIN predictions are in accordance with 11/19 (58%) of them. Noticeably, when the functional predictions obtained by the global optimization method (GOM <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>) for the same proteins are considered, only 4/13 (31%) predictions are in accordance with the experimentally determined functions. Taken together, these observations strengthen the relevance of the PRODISTIN predictions for the uncharacterized proteins.</p>
				<p>Interestingly enough, the PRODISTIN method also reveals the existence of clusters containing only proteins of unknown function. In one case, a cellular function can now be proposed for the entire cluster: as class 62 (annotated 'unknown') is nested into class 56 (annotated 'vesicular transport'), all its members can therefore be associated with 'vesicular transport' and <it>a posteriori </it>recent experimental results strengthen our predictions (Table <tblr tid="T2">2</tblr>) <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>.</p>
				<p>Finally, the putative involvement of proteins of already known function in new cellular processes is also encountered. Class 52 (Figure <figr fid="F3">3e</figr>) contains proteins involved in RNA processing, including the members of the two LSM complexes which play a part in mRNA decapping (Lsm1-7) and pre-mRNA splicing (Lsm2-8) <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Given that two small subunit ribosomal proteins Rps28A and B have been found to interact with Lsm2, Lsm4, and Lsm8 in the two-hybrid screen from Uetz <it>et al. </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, these authors suggested either a possible involvement of Lsm proteins in translation/ribosomal biogenesis or an unforeseen role of the ribosomal proteins in RNA splicing. As both proteins share all their interactors with Dcp1 (mRNA-decapping enzyme), PRODISTIN rather suggests a novel implication of Rps28A and B in mRNA decay.</p>
				<p>Altogether, these results lend further support to the ability of the PRODISTIN method to directly derive a cellular function for proteins from the information contained within the interaction network, without using any additional sequence or structure information.</p>
			</sec>
			<sec>
				<st>
					<p>Statistical evaluations of PRODISTIN clusters</p>
				</st>
				<p>To evaluate the quality of PRODISTIN classifications and predictions on a more statistical basis, four different types of control experiments have been performed in order to assess the influence of various parameters.</p>
				<p>First, given that annotations taken from databases may contain inconsistencies, our classification for the yeast proteome (originally established with YPD annotations) was further tested using the Gene Ontology (GO) annotations <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. We used the GO Term Finder tool from the SGD database to search for significant shared GO terms (or their parents) used to describe the genes of interest and to calculate a <it>p </it>value for the occurrence of common terms (for details see Help in <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>). Lists of genes constituting all PRODISTIN classes were successively processed with the GO term finder for the 'biological process' ontology. On average, for 87.3% of the PRODISTIN classes, the best hit, that is, the common GO term with the lowest <it>p </it>value, is in accordance with the class annotation proposed using YPD annotations. These terms are highly statistically significant as a <it>p </it>value &lt; 1e<sup>-6 </sup>is encountered for 83.63% of the classes. Moreover, these terms applied to 77% of the class members on average. As GO terms represent an independent source of functional annotation from YPD, these congruent results confirm that PRODISTIN efficiently clusters proteins having common or related cellular functions.</p>
				<p>In a second control experiment, the overall accuracy of our functional predictions was estimated on the basis of the ability of PRODISTIN to predict correctly the function of already known proteins. For this, we first supposed that members of a given PRODISTIN class all perform the function attributed to the class (independently of their actual function) and then compared these predictions to the known functions. We defined the prediction success rate as the ratio between the number of correctly predicted functions and the total number of predictions. In this test, PRODISTIN performances were compared with those of a 'majority rule' algorithm (MRA <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>), which assigns to a given protein the function most frequently found among its neighbors in the original protein-protein network. As shown in Table <tblr tid="T3">3</tblr>, the highest success rate for function predictions is attained with PRODISTIN. In fact, 67% of the predictions made with PRODISTIN are correct against only 43% of the ones proposed by the MRA.</p>
				<tbl id="T3" hint_layout="single">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>Success rates for PRODISTIN vs majority rule</p>
					</caption>
					<tblbdy cols="3">
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>MR</p>
							</c>
							<c ca="center">
								<p>PRODISTIN</p>
							</c>
						</r>
						<r>
							<c cspan="3">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Success rate</p>
							</c>
							<c ca="center">
								<p>0.43</p>
							</c>
							<c ca="center">
								<p>0.67</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Predictions</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Totally in accordance</p>
							</c>
							<c ca="center">
								<p>0.23</p>
							</c>
							<c ca="center">
								<p>0.35</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Partially in accordance</p>
							</c>
							<c ca="center">
								<p>0.69</p>
							</c>
							<c ca="center">
								<p>0.76</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>In disagreement</p>
							</c>
							<c ca="center">
								<p>0.31</p>
							</c>
							<c ca="center">
								<p>0.24</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Number of proteins on which a prediction is possible</p>
							</c>
							<c ca="center">
								<p>520</p>
							</c>
							<c ca="center">
								<p>346</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<p>Third, we tested the robustness of PRODISTIN towards the presence of both spurious and missing interactions in the dataset because, despite the fact that it was carefully assembled (see Materials and methods), the actual accuracy of our dataset is difficult to estimate. This prompted us to test PRODISTIN's reliability when the topology of the network is disturbed by false or missing edges. For this, we rewired the network by randomly removing edges and putting them back in between pairs of proteins not already connected. PRODISTIN and the MRA were applied to these rewired networks and the pattern of change of the prediction rate was monitored when the percentage of modified edges gradually increases from 0 to 50%. Interestingly, the rate of correct predictions stays remarkably even (between 64 to 67%) (Figure <figr fid="F4">4</figr>). The number of proteins for which a prediction is possible (because they belong to a PRODISTIN class of known function) also remains quite stable (from 389 for the initial network to 471 on average for 50% rewired networks), although the actual number of proteins in the tree increases from 601 to 1,493 on average for 50% rewired networks. Comparison with the MRA clearly shows that, although this algorithm is able to offer a prediction for a larger number of proteins in the network, its success rate is always two to three times lower than that of PRODISTIN. In addition, it is also very sensitive to the introduction of false interactions, as its success rate drops dramatically from 43% for the initial network to 20% on average with 50% rewired networks. In summary, it is possible to conclude that clustering proteins within classes according to their cellular functions has a positive buffering effect on the prediction rate and that PRODISTIN is thus very robust against the presence of false interactions in the dataset.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Robustness of PRODISTIN towards false interactions</p>
					</caption>
					<text>
						<p>Robustness of PRODISTIN towards false interactions. The prediction rate (number of correct predictions divided by number of predictions) was measured for PRODISTIN (yellow curve) and for the majority rule algorithm (green curve) on networks on which a certain percentage of interactions were randomly 'rewired' (from 10 to 50%) (see text). The number of proteins for which a prediction is possible is also reported as a histogram (dark red, PRODISTIN; blue, majority rule). The values correspond to an average of 50 experiments for each percentage of false interactions introduced into the dataset.</p>
					</text>
					<graphic file="gb-2003-5-1-r6-4"/>
				</fig>
				<p>We then tested PRODISTIN's performance on random networks of identical topologies in order to assess whether PRODISTIN clustering would have occurred by chance. For this, all protein names were reshuffled and randomly assigned to nodes in the network. The PRODISTIN analysis of such networks only allows the construction of a tiny number of classes (15 on average, instead of 63), consequently leading to a very low number of proteins for which a prediction is possible (51 on average instead of 389 in the current study). Finally, the prediction rate drops to 60%. This clearly indicates that random interaction networks never lead to both a high number of PRODISTIN classes and a correct prediction rate, as true networks do.</p>
				<p>A final statistical assessment of PRODISTIN has been performed by measuring the robustness of the protein clusters with another criterion based on tree topology (see Materials and methods for details). For this, we applied PRODISTIN to the protein-protein interaction network of the bacterium <it>Helicobacter pylori </it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, for which information on putative true/false positives is available. Using the PBS<sup>&#174; </sup>algorithm, these interactions had been ranked in five experimental categories of decreasing biological confidence (from A to E) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. A recent assessment has further confirmed the existence of a positive correlation between this reliability score and the true-positive rate <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Classification trees built with five datasets corresponding to the interactions of categories A, A+B, A+B+C, A+B+C+D, and A+B+C+D+E were computed and tested for the robustness of their subtrees and the average robustness value was calculated for each tree (see Materials and methods for details). As expected, this value decreases as more interactions of lower biological significance occur in the dataset (Figure <figr fid="F5">5</figr>). This correlation between PBS categories and the average statistical robustness of the trees represents a fourth and independent support for the reliability of the PRODISTIN approach. In addition, the fact that the average robustness value of the yeast tree is almost equivalent to that of the <it>H. pylori </it>A tree reinforces the conclusion that the <it>Saccharomyces </it>tree is biologically meaningful.</p>
				<fig id="F5">
					<title>
						<p>Figure 5</p>
					</title>
					<caption>
						<p>Evaluation of PRODISTIN robustness by analysis of the <it>H. pylori </it>interactome</p>
					</caption>
					<text>
						<p>Evaluation of PRODISTIN robustness by analysis of the <it>H. pylori </it>interactome. Average class robustness index (CRI) value for the five <it>H. pylori </it>trees obtained with interactions of decreasing PBS (blue histograms) and for the yeast tree (orange histogram).</p>
					</text>
					<graphic file="gb-2003-5-1-r6-5"/>
				</fig>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<sec>
				<st>
					<p>Protein-protein interactions as good indicators of protein cellular function</p>
				</st>
				<p>We present here a new bioinformatics method that is able to compute a functional clustering of proteins on the basis of protein-protein interaction data. When applied to the yeast interactome, our method classified 602 proteins, representing a significant part of the proteome (11%), into 64 classes of functionally related proteins.</p>
				<p>Our method was based on the assumption that a distance formula (the Czekanovski-Dice distance) that uses information on shared interactors could potentially mirror a functional distance between proteins. The demonstration that the classification and the protein clustering resulting from PRODISTIN are essentially driven by the cellular function of proteins gives strong support to our initial assumption. This also may be explained by the fact that the chosen distance formula makes it possible to take into account not only the functional information carried by the nearest neighbors in the protein-protein network, but also by proteins two edges away. Therefore, the obtained distance values, once clustered, are able to highlight subgraphs in the network, such as those formed by proteins involved in the same pathway(s) or cellular process(es).</p>
				<p>As we also showed that the PRODISTIN functional distance clusters proteins independently of their sequence similarities and their actual biochemical function, we now have the opportunity to quantify functional relationships between proteins in the same way that sequence alignments make it possible to quantify protein-sequence similarity. PRODISTIN thus represents a useful complement to sequence-comparison methods, which rather point towards proteins that have the same molecular function. It is interesting to note that the majority of proteins with the same biochemical function are not clustered in the tree despite their sequence similarity. This moderate dependence of cellular function on sequence similarities clearly means that many functional similarities are at present missed by sequence-based methods, emphasizing the importance of using other types of data than sequence and structure as a basis for function assessment.</p>
				<p>Two major advantages result from the fact that PRODISTIN computes all interactions constitutive to an interaction network at once. First, it produces a large functional tree, allowing direct comparison in terms of cellular function for any pair or group of proteins. Second, it makes it possible to visualize a large number of cellular processes and their main actors in a single integrated view, thus offering the possibility of examining the links between cellular functions, and more broadly, the organization of cellular functions within the interaction network. In doing so, PRODISTIN functional trees can capture the essential part of the functional information buried in complex interaction networks, something which is at present impossible to deduce from the intricate graphical representations. Consequently, PRODISTIN can be considered to be one of the first cellular bioinformatics tools available that allows not only comparison of the function of individual proteins but also the ability to study cell function more globally. For instance, the dissection of given cellular functions into sub-functions visible at network level or the study of the functional relationships between known cellular functions can be investigated. As discussed in Results, PRODISTIN has shown that the 'vesicular transport' general function can be separated into distinct subfunctions. An analytic approach of this kind could be systematically undertaken for all known yeast cellular functions, as they are statistically represented in the tree, and later on for those of other organisms. As far as the second question of the relationships between functions is concerned, PRODISTIN could represent a valuable functional data-mining tool. It is, for instance, interesting to note that, although there exist 44 different YPD 'cellular roles' to describe the complete yeast proteome, of which 42 are represented by more than one protein in the tree, our PRODISTIN classes at present cover only 29 of them. Despite the existence of biases in the interaction dataset generally, due to a deeper investigation of certain proteins and to methodological flaws, this observation could suggest a predominant role for these 29 cellular functions in the organization of the network.</p>
			</sec>
			<sec>
				<st>
					<p>Comparison of the PRODISTIN method with recent functional prediction methods</p>
				</st>
				<p>Comparison of the results of PRODISTIN with those of other computational methods for assessing and comparing protein functions is not straightforward. Because of the lack of common interaction sets, functional annotations, common evaluation tools and sometimes insufficient description of the algorithms used, no simple benchmarking comparative analyses are yet possible. However, in an attempt to evaluate the relative advantages and disadvantages of the different methods, we compared their results when available. For this purpose, we evaluated PRODISTIN against the MRA <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and two networks-based methods, the GOM <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and the Rives and Galitski method (RGM <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>). We measured their relative behavior in terms of success rate in the prediction of the function of already known proteins (PRODISTIN vs MRA vs GOM), functional assignment of unclassified proteins (PRODISTIN vs GOM), and ability to cope with false-positive and false-negative interactions in the dataset (PRODISTIN vs GOM vs MRA).</p>
				<p>Our results (see Table <tblr tid="T3">3</tblr>) and those of the GOM (Table <tblr tid="T1">1</tblr> in <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>) both agree that the MRA has a lower success rate than PRODISTIN or GOM in predicting the function of known proteins. When the ability of GOM and PRODISTIN to predict a function for 42 otherwise uncharacterized proteins is compared to recently published experimental results as a reference, the latter performs better (Table <tblr tid="T2">2</tblr>). We found that 58% of PRODISTIN predictions are in accordance with the literature, whereas only 31% of the predictions made by the GOM are.</p>
				<p>Finally, when robustness towards the presence of false-positive and false negative interactions is assayed by changing the topology of the network, the MRA again performs less efficiently than PRODISTIN (Figure <figr fid="F4">4</figr>). In addition, on random networks of identical topology, both PRODISTIN and the RGM (Table <tblr tid="T1">1</tblr> in <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>) show that clustering of proteins in true networks is always higher than clustering observed in random networks.</p>
				<p>Unlike GOM, PRODISTIN and RGM produce functional trees as an output. But PRODISTIN goes one step further, by finding functional classes on the tree according to two parameters (the minimal number of annotated proteins for the same function in the class and their minimal representation in the class - 3 and 50%, respectively, in this study). This considerably facilitates the process of function assessment, as it minimizes the ambiguity inherent in tree representation. This class construction also has a positive buffering effect that limits the influence of false interactions on the classification and makes it possible to maintain high prediction rates, as already discussed. One may argue that constructing classes limits the number of proteins for which a prediction is possible. It is then important to note that PRODISTIN settings may be changed easily at different levels. Depending on the goal of the user (favoring class coverage of the tree, for instance), the number of proteins per class can be increased by juggling with the two parameters defining the PRODISTIN classes, but at the unavoidable price of a slight decrease in the overall accuracy of the predictions. Switching from the YPD annotation system to the GO system using GO slim categories also increases the number of classified proteins in the tree and consequently, of possible predictions (D.M., B.J. and C.B., unpublished data).</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st>
			<p>As more interactions become available, the coverage of the proteome and the mean number of interactions per protein will increase, therefore improving the relevance of the protein clusters found by the PRODISTIN method. Noticeably, it can be anticipated that using interactions recently described in the literature as well as new interactions produced by large-scale approaches could rapidly lead to the classification of the majority of the yeast proteome. As far as the PRODISTIN method is concerned, work presently in progress in our laboratory will soon totally automate the tedious task of manually constructing PRODISTIN classes on the tree.</p>
			<p>Finally, PRODISTIN can be applied not only to the proteomes of unicellular organisms (this study) but also to those of metazoans. The classification trees recently obtained on the <it>Drosophila </it>and the human proteome (C.B., S. Siret, P. Mouren and B.J., unpublished data) show protein clusters having a true biological significance. Furthermore, other types of interaction networks such as genetic interaction networks (A. Baudot, B.J., C.B., unpublished data) and transcriptional networks can also benefit from the application of our general method. These new developments will allow PRODISTIN to be applied to a large variety of biological questions, such as the evolutionary fate of duplicated genes, the functional aspects of horizontal transfer of genes from one species to another, the integration of signaling pathways and the evolutionary comparison of gene networks.</p>
		</sec>
		<sec>
			<st>
				<p>Materials and methods</p>
			</st>
			<sec>
				<st>
					<p>Protein-protein interaction data sets</p>
				</st>
				<p>Yeast protein-protein interactions were extracted from the MIPS database <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Only direct binary interactions were selected, based on the method used for their identification (two-hybrid, excluding high-throughput experiments, <it>in vitro </it>binding, far western, gel retardation and biochemical experiments). For high-throughput two-hybrid experiments, 948 interactions were taken from Uetz <it>et al. </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and 839 from Ito's core data <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. This yielded a total of 2,946 interactions involving 2,139 proteins (average connectivity 2.6 interactions per protein). The 1,517 protein-protein interactions involving 730 proteins from <it>Helicobacter pylori </it>and their corresponding PBS categories were taken from Rain <it>et al. </it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Classification method</p>
				</st>
				<p>Only proteins involved in at least three binary interactions were selected for further classification. Taking into account that the existence of false-positive and false-negative interactions weights more for poorly connected proteins, and that the estimated number of interactions per protein is close to five <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>, we chose to rule out proteins for which the contribution of such false interactions may blur the analysis. Proteins in our dataset have 2.6 interactors on average. We thus chose to set the connectivity threshold to be classified to 3, which means that proteins implicated in one or two interactions were not classified but taken into account for the computation. First, it is stated that a relation between two proteins to be classified exists if either they interact with each other and/or they share at least one common interactor. Subsequently, a graph in which vertices are proteins and edges correspond to this relation, was computed. The connected components are computed and the main one containing almost all of the proteins was selected. Second, the Czekanovski-Dice distance between all pairs of proteins of this class was then calculated. This classical distance on graphs corresponds to the formula</p>
				<p>D(i,j) = #(Int(i) &#916; Int(j))/ [#(Int(i) &#8746; Int(j)) + #(Int(i) &#8745; Int(j))]</p>
				<p>in which i and j denote two proteins, Int(i) and Int(j) are the lists of their interactors plus themselves (to decrease the distance between proteins interacting with each other) and &#916; is the symmetrical difference between the two sets. This distance was chosen because it increases the weight of the shared interactors by giving more weight to the similarities than to the differences; it is very close to an ultrametric distance because the vast majority of distance values between protein pairs is at a maximum (for two proteins that do not share any interactor, the distance value is 1, the highest value, whereas for two proteins interacting with each other and sharing exactly the same interactors, the distance value is 0, the lowest value). Consequently, the advantage of choosing this distance is that it authorizes the use of tree representation. With such distance values, only one tree structure fits the initial distance values, independently of the chosen clustering algorithm. We have used the BioNJ algorithm <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> to build a tree from our distance matrices. This is an improvement of the neighbor-joining algorithm <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, which takes into account the variance of the distance between proteins to evaluate the length of the branches in the tree. A circular classification tree was then drawn using the TreeDyn package <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Sequence alignments and analysis</p>
				</st>
				<p>Pairwise sequence alignments have been performed on the set of 602 protein sequences classified with the PRODISTIN method. Both Needleman-Wunsch (global alignment) and Smith-Waterman (local alignment) algorithms have been applied. The programs used for the two algorithms are available at <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> and <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, respectively. The chosen alignment matrix was BLOSUM50, and the gap-opening and gap-extension penalties were set to 12 and 2, respectively. The resulting 363,004 alignments have been processed to calculate the distance corresponding to the percentage of similarity for each protein pair in the global alignment and for the score in the local alignment.</p>
			</sec>
			<sec>
				<st>
					<p>Subtree robustness measurement</p>
				</st>
				<p>The robustness of each subtree was computed by measuring its homogeneity using a criterion based on topology. Considering triples made of two elements within a given subtree and one outside the subtree (possibly restricted to the sibling subtree), we evaluated the percentage of these triples for which the two elements belonging to the same subtree are separated by the smallest distance value. This allowed us to calculate a class robustness index (CRI) for each inner branch, which was computed by the Qualitree program <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> as a measurement of robustness/quality of the downward class. CRI may be considered as functionally equivalent to the bootstrap index usually used to assess the quality of phylogenetic subtrees. CRI values for PRODISTIN classes are available in Additional data file 1. The average CRI per tree corresponds to the sum of all triples for which the two elements belonging to the same subtree are separated by the smallest distance value divided by the sum of possible triples.</p>
			</sec>
			<sec>
				<st>
					<p>Annotation sources and functional tree visualization</p>
				</st>
				<p>We downloaded the 'cellular role', 'functional categories' and 'sub-cellular localization' annotation files for yeast proteins from YPD <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> on 28 May 2002. The category labels were then loaded into Treedyn <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> for a direct class visualization on the trees as displayed in Figure <figr fid="F2">2b</figr>.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Additional data files</p>
			</st>
			<p>The following additional data are available: details of all the proteins and protein classes included in this analysis (Additional data file <supplr sid="s1">1</supplr>), and details of the functional predictions and comparisons with predictions obtained by other means (Additional data file <supplr sid="s2">2</supplr>).</p>
			<suppl id="s1">
				<title>
					<p>Additional data file 1</p>
				</title>
				<caption>
					<p>Details of all the proteins and protein classes included in this analysis</p>
				</caption>
				<text>
					<p>Composition of the 63 PRODISTIN classes. Numbers in column ''Cellular Role' Annotation' indicate founder proteins for each class (see Fig. 3 legend). When two 'Cellular Roles' are assigned to a same class, 1 and 2 indicate proteins annotated with the first one and/or the second class respectively. A question mark (?) marks proteins of unknown function. The class robustness index is indicated for each class</p>
				</text>
				<file name="gb-2003-5-1-r6-s1.xls">
					<p>Click here for additional data file</p>
				</file>
			</suppl>
			<suppl id="s2">
				<title>
					<p>Additional data file 2</p>
				</title>
				<caption>
					<p>Details of the functional predictions and comparisons with predictions obtained by other means</p>
				</caption>
				<text>
					<p>Details of the functional predictions and comparisons with predictions obtained by other means</p>
				</text>
				<file name="gb-2003-5-1-r6-s2.xls">
					<p>Click here for additional data file</p>
				</file>
			</suppl>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>We thank J.-C. Rain for providing the <it>H. pylori </it>data, A. Baudot, L. Fasano, S. Gangloff, A. Kissenpfennig, D. Nesic, E. Remy, L. R&#246;der, J. Smith and D. Thieffry for carefully reading the manuscript and helpful discussions, and Pierre Mouren for technical assistance. This project is supported by three Action Bioinformatique inter-EPST grants to A.G., F.C. and B.J. respectively. C.B. thanks Valigen SA and the Fondation pour la Recherche M&#233;dicale for financial support.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Who's your neighbor? New computational approaches for functional genomics.</p>
				</title>
				<aug>
					<au>
						<snm>Galperin</snm>
						<fnm>MY</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Nat Biotechnol</source>
				<pubdate>2000</pubdate>
				<volume>18</volume>
				<fpage>609</fpage>
				<lpage>613</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/76443</pubid>
						<pubid idtype="pmpid" link="fulltext">10835597</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Protein interaction maps for complete genomes based on gene fusion events.</p>
				</title>
				<aug>
					<au>
						<snm>Enright</snm>
						<fnm>AJ</fnm>
					</au>
					<au>
						<snm>Iliopoulos</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Kyrpides</snm>
						<fnm>NC</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1999</pubdate>
				<volume>402</volume>
				<fpage>86</fpage>
				<lpage>90</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10573422</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Detecting protein function and protein-protein interactions from genome sequences.</p>
				</title>
				<aug>
					<au>
						<snm>Marcotte</snm>
						<fnm>EM</fnm>
					</au>
					<au>
						<snm>Pellegrini</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Ng</snm>
						<fnm>HL</fnm>
					</au>
					<au>
						<snm>Rice</snm>
						<fnm>DW</fnm>
					</au>
					<au>
						<snm>Yeates</snm>
						<fnm>TO</fnm>
					</au>
					<au>
						<snm>Eisenberg</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1999</pubdate>
				<volume>285</volume>
				<fpage>751</fpage>
				<lpage>753</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.285.5428.751</pubid>
						<pubid idtype="pmpid" link="fulltext">10427000</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Conserved clusters of functionally related genes in two bacterial genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Tamames</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Casari</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>1997</pubdate>
				<volume>44</volume>
				<fpage>66</fpage>
				<lpage>73</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9010137</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Conservation of gene order: a fingerprint of proteins that physically interact.</p>
				</title>
				<aug>
					<au>
						<snm>Dandekar</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Snel</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Huynen</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Trends Biochem Sci</source>
				<pubdate>1998</pubdate>
				<volume>23</volume>
				<fpage>324</fpage>
				<lpage>328</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0968-0004(98)01274-2</pubid>
						<pubid idtype="pmpid" link="fulltext">9787636</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>The use of gene clusters to infer functional coupling.</p>
				</title>
				<aug>
					<au>
						<snm>Overbeek</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Fonstein</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>D'Souza</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Pusch</snm>
						<fnm>GD</fnm>
					</au>
					<au>
						<snm>Maltsev</snm>
						<fnm>N</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1999</pubdate>
				<volume>96</volume>
				<fpage>2896</fpage>
				<lpage>2901</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">15866</pubid>
						<pubid idtype="pmpid" link="fulltext">10077608</pubid>
						<pubid idtype="doi">10.1073/pnas.96.6.2896</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.</p>
				</title>
				<aug>
					<au>
						<snm>Pellegrini</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Marcotte</snm>
						<fnm>EM</fnm>
					</au>
					<au>
						<snm>Thompson</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Eisenberg</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Yeates</snm>
						<fnm>TO</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1999</pubdate>
				<volume>96</volume>
				<fpage>4285</fpage>
				<lpage>4288</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">16324</pubid>
						<pubid idtype="pmpid" link="fulltext">10200254</pubid>
						<pubid idtype="doi">10.1073/pnas.96.8.4285</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>A combined algorithm for genome-wide prediction of protein function.</p>
				</title>
				<aug>
					<au>
						<snm>Marcotte</snm>
						<fnm>EM</fnm>
					</au>
					<au>
						<snm>Pellegrini</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Thompson</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Yeates</snm>
						<fnm>TO</fnm>
					</au>
					<au>
						<snm>Eisenberg</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1999</pubdate>
				<volume>402</volume>
				<fpage>83</fpage>
				<lpage>86</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/47048</pubid>
						<pubid idtype="pmpid" link="fulltext">10573421</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>A protein linkage map of <it>Escherichia coli </it>bacteriophage T7.</p>
				</title>
				<aug>
					<au>
						<snm>Bartel</snm>
						<fnm>PL</fnm>
					</au>
					<au>
						<snm>Roecklein</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>SenGupta</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Fields</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>1996</pubdate>
				<volume>12</volume>
				<fpage>72</fpage>
				<lpage>77</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8528255</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>A genomic approach of the hepatitis C virus generates a protein interaction map.</p>
				</title>
				<aug>
					<au>
						<snm>Flajolet</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rotondo</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Daviet</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Bergametti</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Inchauspe</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Tiollais</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Transy</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Legrain</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>2000</pubdate>
				<volume>242</volume>
				<fpage>369</fpage>
				<lpage>379</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1119(99)00511-9</pubid>
						<pubid idtype="pmpid" link="fulltext">10721731</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Genome-wide protein interaction screens reveal functional networks involving Sm-like proteins.</p>
				</title>
				<aug>
					<au>
						<snm>Fromont-Racine</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Mayes</snm>
						<fnm>AE</fnm>
					</au>
					<au>
						<snm>Brunet-Simon</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Rain</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Colley</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Dix</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Decourty</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Joly</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Ricard</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Beggs</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Legrain</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Yeast</source>
				<pubdate>2000</pubdate>
				<volume>17</volume>
				<fpage>95</fpage>
				<lpage>110</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/1097-0061(20000630)17:2&lt;95::AID-YEA16&gt;3.0.CO;2-H</pubid>
						<pubid idtype="pmpid" link="fulltext">10900456</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>A comprehensive two-hybrid analysis to explore the yeast protein interactome.</p>
				</title>
				<aug>
					<au>
						<snm>Ito</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Chiba</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Ozawa</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Yoshida</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Hattori</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Sakaki</snm>
						<fnm>Y</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2001</pubdate>
				<volume>98</volume>
				<fpage>4569</fpage>
				<lpage>4574</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">31875</pubid>
						<pubid idtype="pmpid" link="fulltext">11283351</pubid>
						<pubid idtype="doi">10.1073/pnas.061034498</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Genome-wide analysis of vaccinia virus protein-protein interactions.</p>
				</title>
				<aug>
					<au>
						<snm>McCraith</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Holtzman</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Moss</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Fields</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2000</pubdate>
				<volume>97</volume>
				<fpage>4879</fpage>
				<lpage>4884</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">18326</pubid>
						<pubid idtype="pmpid" link="fulltext">10781095</pubid>
						<pubid idtype="doi">10.1073/pnas.080078197</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>The protein-protein interaction map of <it>Helicobacter pylori</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Rain</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Selig</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>De Reuse</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Battaglia</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Reverdy</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Simon</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Lenzen</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Petel</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Wojcik</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Schachter</snm>
						<fnm>V</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2001</pubdate>
				<volume>409</volume>
				<fpage>211</fpage>
				<lpage>215</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35051615</pubid>
						<pubid idtype="pmpid" link="fulltext">11196647</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>A comprehensive analysis of protein-protein interactions in <it>Saccharomyces cerevisiae</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Uetz</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Giot</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Cagney</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Mansfield</snm>
						<fnm>TA</fnm>
					</au>
					<au>
						<snm>Judson</snm>
						<fnm>RS</fnm>
					</au>
					<au>
						<snm>Knight</snm>
						<fnm>JR</fnm>
					</au>
					<au>
						<snm>Lockshon</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Narayan</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Srinivasan</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Pochart</snm>
						<fnm>P</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2000</pubdate>
				<volume>403</volume>
				<fpage>623</fpage>
				<lpage>627</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35001009</pubid>
						<pubid idtype="pmpid" link="fulltext">10688190</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Protein interaction mapping in <it>C. elegans </it>using proteins involved in vulval development.</p>
				</title>
				<aug>
					<au>
						<snm>Walhout</snm>
						<fnm>AJ</fnm>
					</au>
					<au>
						<snm>Sordella</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Lu</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Hartley</snm>
						<fnm>JL</fnm>
					</au>
					<au>
						<snm>Temple</snm>
						<fnm>GF</fnm>
					</au>
					<au>
						<snm>Brasch</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Thierry-Mieg</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Vidal</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2000</pubdate>
				<volume>287</volume>
				<fpage>116</fpage>
				<lpage>122</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.287.5450.116</pubid>
						<pubid idtype="pmpid" link="fulltext">10615043</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Modular organization of cellular networks.</p>
				</title>
				<aug>
					<au>
						<snm>Rives</snm>
						<fnm>AW</fnm>
					</au>
					<au>
						<snm>Galitski</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2003</pubdate>
				<volume>100</volume>
				<fpage>1128</fpage>
				<lpage>1133</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1073/pnas.0237338100</pubid>
						<pubid idtype="pmpid" link="fulltext">12538875</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Global protein function prediction from protein-protein interaction networks.</p>
				</title>
				<aug>
					<au>
						<snm>Vazquez</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Flammini</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Maritan</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Vespignani</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Nat Biotechnol</source>
				<pubdate>2003</pubdate>
				<volume>21</volume>
				<fpage>697</fpage>
				<lpage>700</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nbt825</pubid>
						<pubid idtype="pmpid" link="fulltext">12740586</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Protein function from the perspective of molecular interactions and genetic networks.</p>
				</title>
				<aug>
					<au>
						<snm>Jacq</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Brief Bioinform</source>
				<pubdate>2001</pubdate>
				<volume>2</volume>
				<fpage>38</fpage>
				<lpage>50</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11465061</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>A re-annotation of the <it>Saccharomyces cerevisiae </it>genome.</p>
				</title>
				<aug>
					<au>
						<snm>Wood</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Rutherford</snm>
						<fnm>KM</fnm>
					</au>
					<au>
						<snm>Ivens</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Rajandream</snm>
						<fnm>M-A</fnm>
					</au>
					<au>
						<snm>Barrell</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Comp Funct Genomics</source>
				<pubdate>2001</pubdate>
				<volume>2</volume>
				<fpage>143</fpage>
				<lpage>154</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1002/cfg.86</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Genomic exploration of the hemiascomycetous yeasts: 19. Ascomycetes-specific genes.</p>
				</title>
				<aug>
					<au>
						<snm>Malpertuy</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Tekaia</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Casaregola</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Aigle</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Artiguenave</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Blandin</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Bolotin-Fukuhara</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Bon</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Brottier</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>de Montigny</snm>
						<fnm>J</fnm>
					</au>
					<etal/>
				</aug>
				<source>FEBS Lett</source>
				<pubdate>2000</pubdate>
				<volume>487</volume>
				<fpage>113</fpage>
				<lpage>121</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0014-5793(00)02290-0</pubid>
						<pubid idtype="pmpid" link="fulltext">11152894</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Can we have confidence in a tree representation?</p>
				</title>
				<aug>
					<au>
						<snm>Gu&#233;noche</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Garreta</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Comput Biol</source>
				<pubdate>2001</pubdate>
				<volume>2066</volume>
				<fpage>45</fpage>
				<lpage>56</lpage>
			</bibl>
			<bibl id="B23">
				<title>
					<p>YPD, PombePD and WormPD: model organism volumes of the BioKnowledge library, an integrated resource for protein information.</p>
				</title>
				<aug>
					<au>
						<snm>Costanzo</snm>
						<fnm>MC</fnm>
					</au>
					<au>
						<snm>Crawford</snm>
						<fnm>ME</fnm>
					</au>
					<au>
						<snm>Hirschman</snm>
						<fnm>JE</fnm>
					</au>
					<au>
						<snm>Kranz</snm>
						<fnm>JE</fnm>
					</au>
					<au>
						<snm>Olsen</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Robertson</snm>
						<fnm>LS</fnm>
					</au>
					<au>
						<snm>Skrzypek</snm>
						<fnm>MS</fnm>
					</au>
					<au>
						<snm>Braun</snm>
						<fnm>BR</fnm>
					</au>
					<au>
						<snm>Hopkins</snm>
						<fnm>KL</fnm>
					</au>
					<au>
						<snm>Kondu</snm>
						<fnm>P</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2001</pubdate>
				<volume>29</volume>
				<fpage>75</fpage>
				<lpage>79</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">29810</pubid>
						<pubid idtype="pmpid" link="fulltext">11125054</pubid>
						<pubid idtype="doi">10.1093/nar/29.1.75</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>The use of protein-protein interaction networks for genome wide protein function comparisons and predictions.</p>
				</title>
				<aug>
					<au>
						<snm>Brun</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Baudot</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Gu&#233;noche</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Jacq</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>In Methods in Proteome and Protein Analysis</source>
				<publisher>Berlin Heidelberg: Springer-Verlag</publisher>
				<editor>Kamp RM, Calvete JJ, Choli-Papadopoulou T</editor>
				<pubdate>2004</pubdate>
				<fpage>103</fpage>
				<lpage>124</lpage>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Pex17p of <it>Saccharomyces cerevisiae </it>is a novel peroxin and component of the peroxisomal protein translocation machinery.</p>
				</title>
				<aug>
					<au>
						<snm>Huhse</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Rehling</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Albertini</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Blank</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Meller</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Kunau</snm>
						<fnm>WH</fnm>
					</au>
				</aug>
				<source>J Cell Biol</source>
				<pubdate>1998</pubdate>
				<volume>140</volume>
				<fpage>49</fpage>
				<lpage>60</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1083/jcb.140.1.49</pubid>
						<pubid idtype="pmpid" link="fulltext">9425153</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Combinatorial control in ubiquitin-dependent proteolysis: don't Skp the F-box hypothesis.</p>
				</title>
				<aug>
					<au>
						<snm>Patton</snm>
						<fnm>EE</fnm>
					</au>
					<au>
						<snm>Willems</snm>
						<fnm>AR</fnm>
					</au>
					<au>
						<snm>Tyers</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>1998</pubdate>
				<volume>14</volume>
				<fpage>236</fpage>
				<lpage>243</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(98)01473-5</pubid>
						<pubid idtype="pmpid" link="fulltext">9635407</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>A screen for genes required for meiosis and spore formation based on whole-genome expression.</p>
				</title>
				<aug>
					<au>
						<snm>Rabitsch</snm>
						<fnm>KP</fnm>
					</au>
					<au>
						<snm>Toth</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Galova</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Schleiffer</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Schaffner</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Aigner</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Rupp</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Penkner</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Moreno-Borchart</snm>
						<fnm>AC</fnm>
					</au>
					<au>
						<snm>Primig</snm>
						<fnm>M</fnm>
					</au>
					<etal/>
				</aug>
				<source>Curr Biol</source>
				<pubdate>2001</pubdate>
				<volume>11</volume>
				<fpage>1001</fpage>
				<lpage>1009</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0960-9822(01)00274-3</pubid>
						<pubid idtype="pmpid" link="fulltext">11470404</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>A network of protein-protein interactions in yeast.</p>
				</title>
				<aug>
					<au>
						<snm>Schwikowski</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Uetz</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Fields</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Nat Biotechnol</source>
				<pubdate>2000</pubdate>
				<volume>18</volume>
				<fpage>1257</fpage>
				<lpage>1261</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/82360</pubid>
						<pubid idtype="pmpid" link="fulltext">11101803</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Functional organization of the yeast proteome by systematic analysis of protein complexes.</p>
				</title>
				<aug>
					<au>
						<snm>Gavin</snm>
						<fnm>AC</fnm>
					</au>
					<au>
						<snm>Bosche</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Krause</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Grandi</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Marzioch</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Bauer</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Schultz</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Rick</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Michon</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Cruciat</snm>
						<fnm>CM</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2002</pubdate>
				<volume>415</volume>
				<fpage>141</fpage>
				<lpage>147</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/415141a</pubid>
						<pubid idtype="pmpid" link="fulltext">11805826</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p><it>Saccharomyces </it>Genome Database</p>
				</title>
				<url>http://genome-www.stanford.edu/Saccharomyces</url>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Identification of the novel proteins Yip4p and Yip5p as Rab GTPase interacting factors.</p>
				</title>
				<aug>
					<au>
						<snm>Calero</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Winand</snm>
						<fnm>NJ</fnm>
					</au>
					<au>
						<snm>Collins</snm>
						<fnm>RN</fnm>
					</au>
				</aug>
				<source>FEBS Lett</source>
				<pubdate>2002</pubdate>
				<volume>515</volume>
				<fpage>89</fpage>
				<lpage>98</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0014-5793(02)02442-0</pubid>
						<pubid idtype="pmpid" link="fulltext">11943201</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Retromer and the sorting nexins Snx4/41/42 mediate distinct retrieval pathways from yeast endosomes.</p>
				</title>
				<aug>
					<au>
						<snm>Hettema</snm>
						<fnm>EH</fnm>
					</au>
					<au>
						<snm>Lewis</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Black</snm>
						<fnm>MW</fnm>
					</au>
					<au>
						<snm>Pelham</snm>
						<fnm>HR</fnm>
					</au>
				</aug>
				<source>EMBO J</source>
				<pubdate>2003</pubdate>
				<volume>22</volume>
				<fpage>548</fpage>
				<lpage>557</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">140746</pubid>
						<pubid idtype="pmpid" link="fulltext">12554655</pubid>
						<pubid idtype="doi">10.1093/emboj/cdg062</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Functions of Lsm proteins in mRNA degradation and splicing.</p>
				</title>
				<aug>
					<au>
						<snm>He</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Parker</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Curr Opin Cell Biol</source>
				<pubdate>2000</pubdate>
				<volume>12</volume>
				<fpage>346</fpage>
				<lpage>350</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0955-0674(00)00098-3</pubid>
						<pubid idtype="pmpid" link="fulltext">10801455</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.</p>
				</title>
				<aug>
					<au>
						<snm>Ashburner</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Ball</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Blake</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Botstein</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Butler</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Cherry</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Davis</snm>
						<fnm>AP</fnm>
					</au>
					<au>
						<snm>Dolinski</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Dwight</snm>
						<fnm>SS</fnm>
					</au>
					<au>
						<snm>Eppig</snm>
						<fnm>JT</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2000</pubdate>
				<volume>25</volume>
				<fpage>25</fpage>
				<lpage>29</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/75556</pubid>
						<pubid idtype="pmpid" link="fulltext">10802651</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>SGD Gene Ontology Term Fineder</p>
				</title>
				<url>http://genome-www4.stanford.edu/cgi-bin/SGD/GO/goTermFinder</url>
			</bibl>
			<bibl id="B36">
				<title>
					<p>Prediction, assessment and validation of protein interaction maps in bacteria.</p>
				</title>
				<aug>
					<au>
						<snm>Wojcik</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Boneca</snm>
						<fnm>IG</fnm>
					</au>
					<au>
						<snm>Legrain</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2002</pubdate>
				<volume>323</volume>
				<fpage>763</fpage>
				<lpage>770</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0022-2836(02)01009-4</pubid>
						<pubid idtype="pmpid" link="fulltext">12419263</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<title>
					<p>MIPS: a database for genomes and protein sequences.</p>
				</title>
				<aug>
					<au>
						<snm>Mewes</snm>
						<fnm>HW</fnm>
					</au>
					<au>
						<snm>Frishman</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Guldener</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Mannhaupt</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Mayer</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Mokrejs</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Morgenstern</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Munsterkotter</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rudd</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Weil</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>31</fpage>
				<lpage>34</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">99165</pubid>
						<pubid idtype="pmpid" link="fulltext">11752246</pubid>
						<pubid idtype="doi">10.1093/nar/30.1.31</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p>Protein-protein interaction maps: a lead towards cellular functions.</p>
				</title>
				<aug>
					<au>
						<snm>Legrain</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Wojcik</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Gauthier</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2001</pubdate>
				<volume>17</volume>
				<fpage>346</fpage>
				<lpage>352</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(01)02323-X</pubid>
						<pubid idtype="pmpid" link="fulltext">11377797</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B39">
				<title>
					<p>On the number of protein-protein interactions in the yeast proteome.</p>
				</title>
				<aug>
					<au>
						<snm>Grigoriev</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>4157</fpage>
				<lpage>4161</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">165980</pubid>
						<pubid idtype="pmpid" link="fulltext">12853633</pubid>
						<pubid idtype="doi">10.1093/nar/gkg466</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B40">
				<title>
					<p>BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data.</p>
				</title>
				<aug>
					<au>
						<snm>Gascuel</snm>
						<fnm>O</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1997</pubdate>
				<volume>14</volume>
				<fpage>685</fpage>
				<lpage>695</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9254330</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B41">
				<title>
					<p>The neighbor-joining method: a new method for reconstructing phylogenetic trees.</p>
				</title>
				<aug>
					<au>
						<snm>Saitou</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Nei</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1987</pubdate>
				<volume>4</volume>
				<fpage>406</fpage>
				<lpage>425</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">3447015</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B42">
				<title>
					<p>TreeDyn</p>
				</title>
				<url>http://viradium.mpl.ird.fr/treedyn</url>
			</bibl>
			<bibl id="B43">
				<title>
					<p>Bioinformatics web site of Dr. Andrew C.R. Martin</p>
				</title>
				<url>http://www.bioinf.org.uk/software</url>
			</bibl>
			<bibl id="B44">
				<title>
					<p>The European Molecular Biology Open Software Suite</p>
				</title>
				<url>http://www.emboss.org</url>
			</bibl>
		</refgrp>
	</bm>
</art>

