At a March 7 meeting, the National Human Genome Research Institute (NHGRI) will formally launch a consortium project to develop methods for identifying and locating functional elements of the human genome. Called the Encyclopedia of DNA Elements (ENCODE), the pilot project will involve $12 million in initial funds.
"This is where the information really is," said ENCODE program director Peter Good. "This is where you are going to learn how things work and how things cause disease."
Consortium members will develop and compare high-throughput techniques using a defined set of target sequences comprising about one percent of the human genome. The project's goal is to come up with a combination of effective, inexpensive methodologies, which will then be used on the remaining 99% of the genome.
NHGRI recently released two Requests for Applications (RFA). Program director Elise Feingold expects that between five and fifteen research groups will be funded initially, depending on who applies and how large the groups are. The consortium will be open to both ENCODE-funded and non-ENCODE-funded researchers.
"When the Human Genome Project ends, everything begins," said Eric Green, director of NHGRI's intramural research division. Green has served on planning committees for ENCODE, which is one of several anticipated post-Human Genome Projects. "We are so profoundly ignorant of the human genome sequence. You cannot point to even 10 kilobases of the human genome, certainly not 100 kilobases, where any scientist can stand up and say 'I know everything that is functionally important in those 10 kilobases.'"
Scientists' best guess is that only about five percent of the genome is functional. Although some of those five percent are genes, most of it is comprised of various other types of biologically significant sequences. And most of it is still a complete mystery, according to Good. "Where are the promoters? Where are the enhancer elements?" There is also a lot of what Good called "dark matter" - sequences that look as though they might be genes.
ENCODE will be a learning-by-doing project, involving both computational and experimental biologists. "We really expect this to be an iterative process," said Feingold, "whereby one type of approach will inform another."
Green expects consortium members will "overkill" the initial one percent of the genome with over-sampling and debate. But they need to get a firm footing and establish a "gold standard" before they can tackle the genome as a whole, he said.
What will happen after the initial three-year pilot program is unclear. The consortium may not be ready to take it to a genomic level at that time, said Feingold, but with hope they'll at least be able to scale it up.