Author
DOROGHAZI, JAMES - University Of Illinois | |
ALBRIGHT, JESSICA - Northwestern University | |
GOERING, ANTHONY - Northwestern University | |
JU, KOU-SAN - University Of Illinois | |
HAINES, ROBERT - University Of Illinois | |
TCHALUKOV, KONSTANTIN - University Of Illinois | |
Labeda, David | |
KELLAHER, NEIL - University Of Illinois | |
METCALF, WILLIAM - University Of Illinois |
Submitted to: Nature Chemical Biology
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 9/4/2014 Publication Date: 9/28/2014 Citation: Doroghazi, J.R., Albright, J.C., Goering, A.W., Ju, K.-S., Haines, R.R., Tchalukov, K.A., Labeda, D.P., Kellaher, N.L., Metcalf, W.W. 2014. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nature Chemical Biology. 10(11):963-968. Interpretive Summary: The ongoing emergence of bacterial strains resistant to the antibiotics used in human and veterinary medicine is negatively impacting the ability to successfully treat infections with the currently available drugs. The discovery of new antibiotics by the pharmaceutical industry using the methods of the recent past has become exceedingly difficult and expensive. The cost of sequencing microbial genomes continues to shrink and is exploited in the present study in demonstrating the feasibility of mining genome sequence data from actinobacterial strains to discover novel biosynthetic capabilities that could lead to the production of new natural products, such as antibiotics. Extrapolation of these results predicts that actinobacteria can continue to supply a large, but not unlimited, number of novel compounds for evaluation as new medically important drugs. Technical Abstract: Actinobacteria encode a wealth of natural product biosynthetic gene clusters, whose systematic study is complicated by numerous repetitive motifs. By combining several metrics we developed a method for global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic capacity of Actinobacteria in 830 genome sequences, including 344 obtained for this project. The GCF framework was validated in hundreds of strains by correlating confident detection of known small molecules with the presence/absence of their established biosynthetic gene clusters. The method also linked previously unassigned GCFs to known natural products, an approach that will enable de novo, bioassay-free discovery of novel natural products using large data sets. Extrapolation from the 830-genome dataset reveals that Actinobacteria encode a large, but finite, supply of future drug leads, while the strong correlation between phylogeny and GCFs frames a roadmap to efficiently access them. |