This article is part of a special issue on exome sequencing.

Open Access Highly Accessed Open Badges Research

The functional spectrum of low-frequency coding variation

Gabor T Marth1*, Fuli Yu2, Amit R Indap1, Kiran Garimella3, Simon Gravel4, Wen Fung Leong1, Chris Tyler-Smith5, Matthew Bainbridge2, Tom Blackwell6, Xiangqun Zheng-Bradley7, Yuan Chen5, Danny Challis2, Laura Clarke7, Edward V Ball8, Kristian Cibulskis3, David N Cooper8, Bob Fulton9, Chris Hartl3, Dan Koboldt9, Donna Muzny4, Richard Smith7, Carrie Sougnez3, Chip Stewart1, Alistair Ward1, Jin Yu2, Yali Xue5, David Altshuler3, Carlos D Bustamante4, Andrew G Clark10, Mark Daly3, Mark DePristo3, Paul Flicek7, Stacey Gabriel3, Elaine Mardis9, Aarno Palotie5, Richard Gibbs2 and the 1000 Genomes Project

Author Affiliations

1 Department of Biology, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA

2 Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA

3 Population Genomics Program, Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA

4 Department of Genetics, Stanford University, 300 Pasteur Drive, Stanford, CA 94305, USA

5 The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

6 School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA

7 European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

8 Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK

9 The Genome Institute, Washington University School of Medicine, 4444 Forest Park Avenue, St Louis, MO 63108, USA

10 Department of Molecular Biology and Genetics, Cornell University, 107 Biotechnology Building, Ithaca, NY 14853, USA

For all author emails, please log on.

Genome Biology 2011, 12:R84  doi:10.1186/gb-2011-12-9-r84

Published: 14 September 2011



Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.


The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants.


This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.