Open Access Highly Accessed Open Badges Method

A standard variation file format for human genome sequences

Martin G Reese1*, Barry Moore2, Colin Batchelor3, Fidel Salas1, Fiona Cunningham4, Gabor T Marth5, Lincoln Stein6, Paul Flicek4, Mark Yandell2 and Karen Eilbeck7*

Author affiliations

1 Omicia, 2200 Powell Street, Suite 525, Emeryville, CA 94608, USA

2 Department of Human Genetics and Eccles Institute of Human Genetics, 15 North 2030 East, University of Utah, Salt Lake City, UT 84108, USA

3 Royal Society of Chemistry, Thomas Graham House, Cambridge, CB4 0WF, UK

4 EMBL Outstation - Hinxton, European Bioinformatics Institute, Wellcome Trust, Genome Campus, Hinxton, Cambridge, CB10 1SD, UK

5 Department of Biology, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA

6 Ontario Institute for Cancer Research, 101 College St, Suite 800, Toronto, ON M5G0A3, Canada

7 Department of Biomedical Informatics, Health Sciences Education Building, Suite 5700, 26 South 2000 East, University of Utah, Salt Lake City, UT 84112, USA

For all author emails, please log on.

Citation and License

Genome Biology 2010, 11:R88  doi:10.1186/gb-2010-11-8-r88

Published: 26 August 2010


Here we describe the Genome Variation Format (GVF) and the 10Gen dataset. GVF, an extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data. The 10Gen dataset, ten human genomes in GVF format, is freely available for community analysis from the Sequence Ontology website and from an Amazon elastic block storage (EBS) snapshot for use in Amazon's EC2 cloud computing environment.