Indiana University Integrative Data Science Lab (IDSL)

Welcome to the Integrative Data Science Lab (IDSL) in Indiana University's School of Informatics, Computing, and Engineering. Integrative Data Science brings together diverse data sets,  technologies and expertise with linked data, machine learning and data science approaches to solve real world problems. We are pioneering both infrastructures for integrative data science and domain applications of these infrastructures. The IDSL is directed by David Wild. Here are some of the areas we are applying integrative data science:


Drug Discovery

 Accelerating drug discovery and
reducing clinical trails failures

We are pioneering new ways to accelerate drug discovery, including making better decisions earlier, avoiding expensive failures in clinical trails, and finding the right treatments for the right patients. We developed Chem2Bio2RDF, the first large scale semantic linked data repository for preclinical drug discovery, novel link prediction and data mining algorithms for finding hidden insights in large heterogeneous data networks, and in our 2012 Drug Discovery Today paper laid out a strategy for using semantic technologies and data science to avoid some of the perils of single-target drug discovery.  Current projects include researching knowledge networks that encode computable networks for multi-mechanism complex diseases, integrating patient medical records with molecular data to help identify potential targeted therapies, and accelerating drug discovery with automated synthesis and machine learning. We are thankful to NIH NCATS, Indiana CTSI, the OpenPHACTS foundation, Eli Lilly, and Pfizer for funding of this work. Applications in this area are being commercialized in our company Data2Discovery Inc.

Emergency Response & Management

Improving outcomes in disasters and emergency response by smart integration of unconventional datasets, expertise and technologies

We live in an era of profound risk and uncertainty, with climate change, healthcare challenges, pervasive technology, infrastructure vulnerabilities, and cybersecurity all creating new and enhanced threats at local, regional and national levels. We are researching highly creative ways that data from unconventional sources and low cost technologies can be used together to help emergency managers, emergency responders and citizens better understand, prepare for, mitigate and respond to this new landscape. We are re-imagining situational awareness, emergency protocols and resource planning in a world awash with data and technology. We are doing this this through strong partnerships at the local, regional and national level with city government, fire, police and EMS departments and data providers.


Regional Economic Development

Identifying precision policy changes through data science

Through a $1.4m grant from the Economic Development Agency (EDA) and in collaboration with the Center for Complex Networks and Systems Research and the Indiana Business Research Center at IU, we are researching how Data Science, Complex Adaptive Systems and Social Sciences can come together to identify precision policy changes to promote regional economic development. We are doing this through a linked data ecosystem that maps together non-traditional and unconventional datasets, and advanced data gathering and data mining techniques.

Interested in learning Cheminformatics?

We no longer offer formal degrees, but some of our learning materials and a PDF/Kindle eBook are available at

Recent Publications

For more publications, see David Wild's Google Scholar page.

PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets.
  Djokic-Petrovic, M, Cvjetkovic, V., Yang, J., Wild, D.J. Journal of Biomedical Semantics 8(1), 42, 2017.
City-wide Analysis of Drug-Drug-Interactions.
  Correia, R., Pereira de Araújo, L., Mattos, M.M., Wild, D.J., Rocha, L.M. Translational Bioinformatics Conference, 2017, Accepted.
An activity canyon characterization of the pharmacological topography
  VS Kulkarni, DJ Wild. Journal of Cheminformatics 8 (1), 41, 2016
Netpredictor: R and Shiny package to perform Drug-Target Bipartite network analysis and prediction of missing links.
  A Seal, DJ Wild bioRxiv, 080036, 2016
Data Science and Online Education
  G Fox, S Maini, H Rosenbaum, D Wild. 2015 IEEE 7th International Conference on Cloud Computing Technology and Science  
  (CLOUDCOM), 2015
Optimizing drug–target interaction prediction based on random walk on heterogeneous networks
  A Seal, YY Ahn, DJ Wild. Journal of cheminformatics 7 (1), 1, 2015
Novel phenotypic outcomes identified for a public collection of approved drugs from a publicly accessible panel of assays
  JA Lee, P Shinn, S Jaken, S Oliver, FS Willard, S Heidler, RB Peery, ... PloS one 10 (7), e0130796, 2015
Semantic Breakthrough in Drug Discovery
  B Chen, H Wang, Y Ding, D Wild. Synthesis Lectures on the Semantic Web: Theory and Technology 4 (2), 1-142, 2015
Applications of the YarcData Urika in Drug Discovery and Healthcare
  R Henschel, A Seal, JJ Yang, DJ Wild, Y Ding, A Thota, S Michael, ... Proceedings of the Cray User Group, 2015.
A possible gut microbiota basis for weight gain side effects of antipsychotic drugs
  H Joshi, A Parihar, D Jiao, S Murali, DJ Wild. arXiv:1401.2389, 2014
Practice and Challenges of Building a Semantic Framework for Chemogenomics Research
  B Chen, DJ Wild. Molecular Informatics 32 (11‐12), 1000-1008, 2014



Selected Tools and Resources from the IDSL

The following tools and resources are some of the most widely used from prior IDSL projects. Additional tools and resources include the T2DM-NET knowledge network for diabetes, the NCATS Phenotypic Drug Discovery Resource, and the ChemBioSpace drug/gene/disease/side-effect association finding tool



SLAP is a tool that will profile drugs against targets (and vice versa) using semantically linked networks of information on compounds, genes, pathways, and related information. It can be used in numerous ways including predicting on- and off-target interactions, drug repurposing, and identification of mechanisms of action. For more information, see our PLoS Compuational Biology paper. You can also try the tool by clicking the link to the left. A related tool call SEMAP is available for commercial use from Data2Discovery Inc.



Chem2Bio2RDF is a demonstration of the power of semantically linked data. It shows how numerous publicly available biomedical datasets can be linked together and used to answer important biomedical questions that would otherwise be very difficult to answer, such as identifying multiple pathway inhibitors and associating drugs with particular side effects. You can read more in our BMC Bioinformatics paper, or access the resource by clicking the link to the left. Note that this was a proof-of-concept project and the data is not updated. However, Chem2Bio2RDF was drawn into the OpenPHACTS project which maintains frequently updated data.



NetPredictor is an open source R package for prediction of missing links in any given bipartite network. The package provides utilities to compute missing links in bipartite and unipartite networks using Random Walk with Restart and a network inference algorithm. The package also allows computation of bipartite network properties, visualization of communities for two different sets of nodes, and calculation of significant interactions between two sets of nodes using permutation based testing. It includes an example application written in R-Shiny for prediction of drug-target associations. You can read more in our bioRxiv paper, or access source code and documentation in the github repository by clicking the image to the left.