Location: Plant, Soil and Nutrition ResearchTitle: Highly accurate HiFi long read sequencing data for five complex genome samples
|HON, TING - Pacific Biosciences Inc|
|MARS, KRISTIN - Pacific Biosciences Inc|
|YOUNG, GREG - Pacific Biosciences Inc|
|TSAI, YU-CHIH - Pacific Biosciences Inc|
|KAURALIS, JOSEPH - Pacific Biosciences Inc|
|LANDOLIN, JANE - Ravel Biotechnology|
|MAURER, NICHOLAS - University Of California Santa Cruz|
|KUDRNA, DAVID - Arizona Genomics Institute|
|HARDIGAN, MICHAEL - University Of California, Davis|
|STEINER, CYNTHIA - Beckman Research Institute|
|KNAPP, STEVE - University Of California, Davis|
|SHAPIRO, BETH - University Of California Santa Cruz|
|PELUSO, PAUL - Pacific Biosciences Inc|
|RANK, DAVID - Pacific Biosciences Inc|
Submitted to: Scientific Data - Nature
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/27/2020
Publication Date: 10/27/2020
Citation: Hon, T., Mars, K., Young, G., Tsai, Y., Kauralis, J., Landolin, J.M., Maurer, N., Kudrna, D., Hardigan, M.A., Steiner, C.C., Knapp, S., Ware, D., Shapiro, B., Peluso, P., Rank, D.R. 2020. Highly accurate HiFi long read sequencing data for five complex genome samples. Scientific Data - Nature. 7. Article e399. https://doi.org/10.1038/s41597-020-00743-4.
Interpretive Summary: There is a need for benchmarking data sets to validate and support improved algorithms for assembly. In this paper we present deep coverage of PacBio HiFi sequencing reads for mouse, frog, corn, and strawberry genomes with an average size of 10-25kb, and greater than 99.5% accuracy. We also include mock microbial community meta genome data set. These data sets can be used without restriction to develop new algorithms to support assembly and analyses of complex genome structure and evolution.
Technical Abstract: The PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets whose reads average 10-25 kb with accuracies of greater than 99.5%. These accurate long reads are applicable and improve results for complex applications such as improved single nucleotide and structural variant detection, improved genome assembly, assembly of difficult polyploid or highly repetitive genomes, and the assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus, and Zea mays, as well as two outbred complex genomes, the octoploid Fragaria ananassa, and the anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II instrument.