Location: Genetics and Animal BreedingTitle: Chromosome-length haplotigs for yak and cattle from trio binning assembly of an F1 hybrid
|RICE, EDWARD - University Of Nebraska|
|KOREN, SERGEY - National Human Genome Research Institute|
|RHIE, ARANG - National Human Genome Research Institute|
|Heaton, Michael - Mike|
|KALBFLEISCH, THEODORE - University Of Kentucky|
|HARDY, TIMOTHY - Usyaks|
|HACKETT, PETER - Usyaks|
|Rosen, Benjamin - Ben|
|VANDER LEY, BRIAN - University Of Nebraska|
|MAURER, NICHOLAS - University Of California Santa Cruz|
|GREEN, RICHARD - University Of California Santa Cruz|
|PHILLIPPY, ADAM - National Human Genome Research Institute|
|PETERSEN, JESSICA - University Of Nebraska|
|Smith, Timothy - Tim|
Submitted to: bioRxiv
Publication Type: Research Notes
Publication Acceptance Date: 8/15/2019
Publication Date: 8/15/2019
Citation: Rice, E.S., Koren, S., Rhie, A., Heaton, M.P., Kalbfleisch, T.S., Hardy, T., Hackett, P.H., Bickhart, D.M., Rosen, B.D., Vander Ley, B., Maurer, N.W., Green, R.E., Phillippy, A.M., Petersen, J.L., Smith, T.P.L. 2019. Chromosome-length haplotigs for yak and cattle from trio binning assembly of an F1 hybrid. bioRxiv. 737171. https://doi.org/10.1101/737171.
Interpretive Summary: We have developed a new approach to genome assembly that makes use of the difference between maternally-derived and paternal chromosomes, to produce two haploid genome assemblies from a single individual. This process involves collecting sequence of both parents of an animal, and determining where there is unique sequence that distinguishes them from one another. We then use a list of all these differences, to sort the sequence reads from the offspring into “bins” depending on whether they come from maternal or paternally-inherited chromosomes. In this way, we can create two higher quality, higher accuracy genomes out of a single sample. We apply this approach to the offspring of the mating of a Scottish Highland beef bull and a Yak cow, an interspecies hybrid, to create both Highland and Yak genome assemblies. Both the assemblies represent two of the highest quality assemblies of any mammal to date, rivaling the human reference genome in accuracy and continuity.
Technical Abstract: Background: Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions in their entirety with higher continuity and accuracy than is possible with other methods. Results: We used trio binning to assemble reference genomes for two species from a single individual using an interspecies cross of yak (Bos grunniens) and cattle (Bos taurus). The high heterozygosity inherent to interspecies hybrids allowed us to confidently assign >99% of long reads from the F1 offspring to parental bins using unique k-mers from parental short reads. Both the maternal (yak) and paternal (cattle) assemblies contain over one third of the acrocentric chromosomes, including the two largest chromosomes, in single haplotigs. Conclusions: These haplotigs are the first vertebrate chromosome arms to be assembled gap-free and fully phased, and the first time assemblies for two species have been created from a single individual. Both assemblies are the most continuous currently available for non-model vertebrates.