Integrative Data Science for Drug Discovery & Healthcare
Using expertise in data science, linked data, cheminformatics, AI & machine learning, we are pioneering new ways to accelerate drug discovery, including making better decisions earlier, avoiding expensive failures in clinical trails, and finding the right treatments for the right patients. We developed the first large scale semantic linked data repository for drug discovery, novel link prediction and data mining algorithms for finding hidden insights in large heterogeneous data networks, and in our 2012 Drug Discovery Today paper laid out a strategy for using data science to avoid some of the perils of single-target drug discovery. Current projects include using knowledge networks to encode computable networks for multi-mechanism complex diseases, integrating patient medical records with molecular data for targeted therapies, drug repurposing using massive linked data networks, and accelerating drug discovery with automated synthesis and machine learning. We are thankful to NIH NCATS, Indiana CTSI, the OpenPHACTS foundation, Eli Lilly, and Pfizer for funding of this work. Some of this work is being applied commercially in our spin-off company Data2Discovery Inc.
EXAMPLES OF SUCCESSFUL PROJECTS
The following tools and resources are some of the most widely used from prior IDSL projects. Additional tools and resources include the T2DM-NET knowledge network for diabetes, the NCATS Phenotypic Drug Discovery Resource, and the ChemBioSpace drug/gene/disease/side-effect association finding tool
SEMANTIC LINK ASSOCIATION PREDICTION (SLAP)
SLAP demonstrates how it is possible to profile drugs against targets (and vice versa) using semantically linked networks of information on compounds, genes, pathways, and related information. It can be used in numerous ways including predicting on- and off-target interactions, drug repurposing, and identification of mechanisms of action. For more information, see our PLoS Compuational Biology paper. A related tool call SEMAP is available for commercial use from Data2Discovery Inc.
Chem2Bio2RDF is a demonstration of the power of semantically linked data. It shows how numerous publicly available biomedical datasets can be linked together and used to answer important biomedical questions that would otherwise be very difficult to answer, such as identifying multiple pathway inhibitors and associating drugs with particular side effects. You can read more in our BMC Bioinformatics paper.