Open Access Highly Accessed Open Badges Research

Towards a comprehensive structural variation map of an individual human genome

Andy W Pang12, Jeffrey R MacDonald2, Dalila Pinto2, John Wei2, Muhammad A Rafiq2, Donald F Conrad3, Hansoo Park4, Matthew E Hurles3, Charles Lee4, J Craig Venter5, Ewen F Kirkness5, Samuel Levy5, Lars Feuk26* and Stephen W Scherer12*

Author Affiliations

1 Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, Ontario M5S 1A8, Canada

2 The Centre for Applied Genomics, The Hospital for Sick Children, 101 College Street, Toronto, Ontario M5G 1L7, Canada

3 Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

4 Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, 221 Longwood Avenue, Boston, Massachusetts 02115, USA

5 J Craig Venter Institute, 9740 Medical Center Drive, Rockville, Maryland 20850, USA

6 Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, Uppsala 75185, Sweden

For all author emails, please log on.

Genome Biology 2010, 11:R52  doi:10.1186/gb-2010-11-5-r52

Published: 19 May 2010



Several genomes have now been sequenced, with millions of genetic variants annotated. While significant progress has been made in mapping single nucleotide polymorphisms (SNPs) and small (<10 bp) insertion/deletions (indels), the annotation of larger structural variants has been less comprehensive. It is still unclear to what extent a typical genome differs from the reference assembly, and the analysis of the genomes sequenced to date have shown varying results for copy number variation (CNV) and inversions.


We have combined computational re-analysis of existing whole genome sequence data with novel microarray-based analysis, and detect 12,178 structural variants covering 40.6 Mb that were not reported in the initial sequencing of the first published personal genome. We estimate a total non-SNP variation content of 48.8 Mb in a single genome. Our results indicate that this genome differs from the consensus reference sequence by approximately 1.2% when considering indels/CNVs, 0.1% by SNPs and approximately 0.3% by inversions. The structural variants impact 4,867 genes, and >24% of structural variants would not be imputed by SNP-association.


Our results indicate that a large number of structural variants have been unreported in the individual genomes published to date. This significant extent and complexity of structural variants, as well as the growing recognition of their medical relevance, necessitate they be actively studied in health-related analyses of personal genomes. The new catalogue of structural variants generated for this genome provides a crucial resource for future comparison studies.