Open Access Highly Accessed Open Badges Research

Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies

David B Neale1*, Jill L Wegrzyn1, Kristian A Stevens2, Aleksey V Zimin3, Daniela Puiu4, Marc W Crepeau2, Charis Cardeno2, Maxim Koriabine5, Ann E Holtz-Morris5, John D Liechty1, Pedro J Martínez-García1, Hans A Vasquez-Gross1, Brian Y Lin1, Jacob J Zieve1, William M Dougherty2, Sara Fuentes-Soriano6, Le-Shin Wu7, Don Gilbert6, Guillaume Marçais3, Michael Roberts3, Carson Holt8, Mark Yandell8, John M Davis9, Katherine E Smith10, Jeffrey FD Dean11, W Walter Lorenz11, Ross W Whetten12, Ronald Sederoff12, Nicholas Wheeler1, Patrick E McGuire1, Doreen Main13, Carol A Loopstra14, Keithanne Mockaitis6, Pieter J deJong5, James A Yorke3, Steven L Salzberg4 and Charles H Langley2

Author Affiliations

1 Department of Plant Sciences, University of California, Davis CA, USA

2 Department of Evolution and Ecology, University of California, Davis CA, USA

3 Institute for Physical Sciences and Technology (IPST), University of Maryland, College Park MD, USA

4 Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore MD, USA

5 Children’s Hospital Oakland Research Institute, Oakland CA, USA

6 Department of Biology, Indiana University, Bloomington IN, USA

7 National Center for Genome Analysis Support, Indiana University, Bloomington IN, USA

8 Eccles Institute of Human Genetics, University of Utah, Salt Lake City UT, USA

9 School of Forest Resources and Conservation, Genetics Institute, University of Florida, Gainesville FL, USA

10 Southern Institute of Forest Genetics, USDA Forest Service, Southern Research Station, Saucier MS, USA

11 Warnell School of Forestry and Natural Resources, University of Georgia, Athens GA, USA

12 Department of Forestry and Environmental Resources, North Carolina State University, Raleigh NC, USA

13 Department of Horticulture, Washington State University, Pullman WA, USA

14 Department of Ecosystem Science and Management, Texas A&M University, College Station TX, USA

For all author emails, please log on.

Genome Biology 2014, 15:R59  doi:10.1186/gb-2014-15-3-r59

Published: 20 March 2014



The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.


We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.


In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.