Sequencing and analysis of an Irish human genome
- Equal contributors
1 Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
2 MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
3 Colon Cancer Genetics Group and Academic Coloproctology, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK
4 Department of Clinical Neurological Sciences, Royal College of Surgeons in Ireland, Dublin 2, Ireland
5 School of Mathematical Sciences, University College Dublin, Belfield, Dublin 4, Ireland
6 Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland
7 Department of Neurology, Beaumont Hospital and Trinity College Dublin, Beaumont Road, Dublin 9, Ireland
8 School of Agriculture, Food Science and Veterinary Medicine, University College Dublin, Belfield, Dublin 4, Ireland
9 Centre for Population Health Sciences, University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, UK
Genome Biology 2010, 11:R91 doi:10.1186/gb-2010-11-9-r91Published: 7 September 2010
Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.
Using sequence data from a branch of the European ancestral tree as yet unsequenced, we identify variants that may be specific to this population. Through comparisons with HapMap and previous genetic association studies, we identified novel disease-associated variants, including a novel nonsense variant putatively associated with inflammatory bowel disease. We describe a novel method for improving SNP calling accuracy at low genome coverage using haplotype information. This analysis has implications for future re-sequencing studies and validates the imputation of Irish haplotypes using data from the current Human Genome Diversity Cell Line Panel (HGDP-CEPH). Finally, we identify gene duplication events as constituting significant targets of recent positive selection in the human lineage.
Our findings show that there remains utility in generating whole genome sequences to illustrate both general principles and reveal specific instances of human biology. With increasing access to low cost sequencing we would predict that even armed with the resources of a small research group a number of similar initiatives geared towards answering specific biological questions will emerge.