|HELMKAMPF, MARTIN - University Of Hawaii|
|BELLINGER, RENEE - University Of Hawaii|
|TAKABAYASHI, MISAKI - University Of Hawaii|
Submitted to: Genome Biology and Evolution
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 6/24/2019
Publication Date: 6/27/2019
Citation: Helmkampf, M., Bellinger, R., Geib, S.M., Sim, S.B., Takabayashi, M. 2019. Draft genome of the rice coral Montipora capitata obtained from linked-read sequencing. Genome Biology and Evolution. 11(7):2045-2054. https://doi.org/10.1093/gbe/evz135.
Interpretive Summary: The coral species Montipora capitata, commonly known as the rice coral, is a keystone species in reef architecture in the Hawaiian Islands. However, despite its important role in marine ecology, few genomic resources have been developed for this species. In this study, we used an emerging technique in genomics known as linked-read sequencing to sequence and assemble the M. capitata genome to produce a de novo draft assembly. Though high heterozygosity and repetitive elements resulted in a more fragmented assembly than expected, the assembly size was close to its estimate based on unique sequence abundance and its gene content was near complete based on its high proportion of genes that are present in most eukaryotes. This study shows that linked-read sequencing is a cost effective and appropriate method for generating a complete assembly in this coral species and likely other metazoans.
Technical Abstract: The rice coral, Montipora capitata, is widely distributed throughout the Indo-Pacific and comprises one of the most important reef-building species in the Hawaiian Islands. Here we describe a de novo assembly of its genome based on a linked-read sequencing approach developed by 10x Genomics. The final draft assembly consisted of 21,422 scaffolds with an N50 score of 144 kb, and contained a fairly complete set (81%) of metazoan benchmarking (BUSCO) genes. With an assembly size of 680 Mb, the genome was estimated to encompass 700 Mb based on k-mer abundance, but may be larger due to its high fraction of repetitive sequence. Repeat analysis indicated at least 42% of the assembly consisted of interspersed, unclassified repeats, and almost 3% tandem repeats. We also identified 41,863 protein-coding genes, which likely represent an over-estimation due to the splitting of some gene models across short scaffolds. Although the assembly was handicapped by the high repeat content and the haplotype distribution in the source DNA, resulting in a higher than expected fragmentation at the scaffold level, the assembly was comparable in quality to other coral genomes obtained by traditional short read sequencing and assembly approaches. Provided high molecular weight DNA is available, linked-read technology thus represents an accurate, easy to use, and cost effective new method capable of providing high-quality genome assemblies of non-model organisms.