Indiana University Integrative Data Science Lab (IDSL)

Welcome to the Integrative Data Science Lab (IDSL) in the School of Informatics and Computing at Indiana University. Integrative Data Science brings together heterogeneous data sets and expertise from different disciplines along with advanced AI, machine learning and data science tools and technologies to solve real world problems. The IDSL is directed by David Wild. Here are the ways we are making a difference using integrative data science:


Using deep expertise in data science, linked data, cheminformatics, AI & machine learning, we are pioneering new ways to accelerate drug discovery, including making better decisions earlier, avoiding expensive failures in clinical trails, and finding the right treatments for the right patients. We developed the first large scale semantic linked data repository for drug discovery, novel link prediction and data mining algorithms for finding hidden insights in large heterogeneous data networks, and in our 2012 Drug Discovery Today paper laid out a strategy for using data science to avoid some of the perils of single-target drug discovery.  Current projects include using knowledge networks to encode computable networks for multi-mechanism complex diseases, integrating patient medical records with molecular data for targeted therapies, drug repurposing using massive linked data networks, and accelerating drug discovery with automated synthesis and machine learning. We are thankful to NIH NCATS, Indiana CTSI, the OpenPHACTS foundation, Eli Lilly, and Pfizer for funding of this work. Some of this work is being applied commercially in our spin-off company Data2Discovery Inc.


We live in an era of profound risk and uncertainty, with climate change, pervasive technology, infrastructure vulnerabilities, and cybersecurity hazards all creating new threats at local, regional and national levels. We are researching highly creative ways that data science and technology to help emergency managers, emergency responders and citizens better understand, prepare for, mitigate and respond to this new landscape. We are applying this research at the local and regional levels through partnerships with University Information Technology Services at IU, Bloomington City Fire Department, and Bloomington City Police Department.


Selected Tools and Resources from the IDSL

The following tools and resources are some of the most widely used from prior IDSL projects. Additional tools and resources include the T2DM-NET knowledge network for diabetes, the NCATS Phenotypic Drug Discovery Resource, and the ChemBioSpace drug/gene/disease/side-effect association finding tool



SLAP is a tool that will profile drugs against targets (and vice versa) using semantically linked networks of information on compounds, genes, pathways, and related information. It can be used in numerous ways including predicting on- and off-target interactions, drug repurposing, and identification of mechanisms of action. For more information, see our PLoS Compuational Biology paper. You can also try the tool by clicking the link to the left. A related tool call SEMAP is available for commercial use from Data2Discovery Inc.



Chem2Bio2RDF is a demonstration of the power of semantically linked data. It shows how numerous publicly available biomedical datasets can be linked together and used to answer important biomedical questions that would otherwise be very difficult to answer, such as identifying multiple pathway inhibitors and associating drugs with particular side effects. You can read more in our BMC Bioinformatics paper, or access the resource by clicking the link to the left. Note that this was a proof-of-concept project and the data is not updated. However, Chem2Bio2RDF was drawn into the OpenPHACTS project which maintains frequently updated data.



NetPredictor is an open source R package for prediction of missing links in any given bipartite network. The package provides utilities to compute missing links in bipartite and unipartite networks using Random Walk with Restart and a network inference algorithm. The package also allows computation of bipartite network properties, visualization of communities for two different sets of nodes, and calculation of significant interactions between two sets of nodes using permutation based testing. It includes an example application written in R-Shiny for prediction of drug-target associations. You can read more in our bioRxiv paper, or access source code and documentation in the github repository by clicking the image to the left.


Former members

Dr Rajarshi Guha (Vis. Asst. Prof), NIH NCATS
Dr Qian Zhu (Postdoc), University of Maryland
Dr Xiao Dong (PhD), University of Illinois, Chicago
Dr Huijun Wang (PhD), Merck
Dr Pulan Yu (PhD), Dow Agrosciences
Dr Bin Chen (PhD), University of California San Francisco
Dr Hari Machina (PhD), Roche
Dr Abhik Seal (PhD), Abbvie
Dr Jae Hong Shin (PhD), University of Texas
Dr Varsha Kulkarni (PhD), New England Complex Systems Inst.


Dr David Wild, Director
Samuel Bentum, Ph.D. Student
Alex Christou, Ph.D. Student
Natalie Franklin, Ph.D. Student
Stefan Furrer, Ph.D. Student
Chris Gessner, Ph.D. Student
Logan Paul, Ph.D. Student
Anurag Passi, (PhD Fulbright Fellow), AcSIR
Jeremy Yang, Ph.D. Student

LATEST Presentations & Publications


A semantic approach to repurposing using chemical and phenotypic data
   National Center for Advancing Translational Sciences, Jan 2017
Big Data in Drug Discovery and Development
    Conference on Information and Knowledge Management (CIKM) 2016, Indianapolis, Indiana, Oct 2016.
Transforming Pharmaceutical and Healthcare Companies into Data Companies
  Indiana SOIC Research Horizons, Bloomington, Indiana, Sep, 2016. See also Presentation Recording.
Opportunities for semantic and network methods in Phenotypic Drug Discovery
    OpenPHACTS Phenotypic Screening Workshop, Santiago de Compostela, Spain, Feb. 2015.
Applying semantic and network methods in AOP knowledge discovery
    Adverse Outcome Pathways: From Research to Regulation. NIH NIEHS Workshop, Bethesda, Maryland, Sep. 2014
Opportunities in toxicology for large scale semantic linked data and prediction
    Society of Toxicology 53rd Annual Meeting, Phoenix, Arizona, March 2014
Social media in cheminformatics education
    American Chemical Society National Meeting, Indianapolis, Indiana, September 2013. 
Large scale cross dataset mining of chemical and biological datasets for drug discovery
    American Chemical Society CINF Webinar, May 2013 (Link to CINF site with recording)
New opportunities for biomedical science and drug discovery using semantic technologies
    Exploiting Big Data Semantics for Translational Medicine, Indiana University, Bloomington, March 2013.  
Semantic integration, search and prediction on drug discovery data at Indiana University
    Leiden University Medical Center, July 2012. 
Assessing drug target association using semantic linked data
    American Chemical Society National Meeting, San Diego, March 2012. See also Presentation Recording
Exploiting semantic networks of public data for systems chemical biology
    American Chemical Society National Meeting, San Diego, March 2012. See also Presentation Recording
New searching paradigms in drug discovery enabled by semantic integration of public data
    American Chemical Society National Meeting, San Diego, March 2012. See also Presentation Recording
Exploiting semantic networks of public data for drug-target prediction
   Open Source for Computer Aided Translational Medicine Conference, Chandigarh, India, February 2012. 


For more publications, see David Wild's Google Scholar page.

An activity canyon characterization of the pharmacological topography
  VS Kulkarni, DJ Wild. Journal of Cheminformatics 8 (1), 41, 2016
Netpredictor: R and Shiny package to perform Drug-Target Bipartite network analysis and prediction of missing links.
  A Seal, DJ Wild bioRxiv, 080036, 2016
Data Science and Online Education
  G Fox, S Maini, H Rosenbaum, D Wild. 2015 IEEE 7th International Conference on Cloud Computing Technology and Science  
  (CLOUDCOM), 2015
Optimizing drug–target interaction prediction based on random walk on heterogeneous networks
  A Seal, YY Ahn, DJ Wild. Journal of cheminformatics 7 (1), 1, 2015
Novel phenotypic outcomes identified for a public collection of approved drugs from a publicly accessible panel of assays
  JA Lee, P Shinn, S Jaken, S Oliver, FS Willard, S Heidler, RB Peery, ... PloS one 10 (7), e0130796, 2015
Semantic Breakthrough in Drug Discovery
  B Chen, H Wang, Y Ding, D Wild. Synthesis Lectures on the Semantic Web: Theory and Technology 4 (2), 1-142, 2015
Applications of the YarcData Urika in Drug Discovery and Healthcare
  R Henschel, A Seal, JJ Yang, DJ Wild, Y Ding, A Thota, S Michael, ... Proceedings of the Cray User Group, 2015.
A possible gut microbiota basis for weight gain side effects of antipsychotic drugs
  H Joshi, A Parihar, D Jiao, S Murali, DJ Wild. arXivarXiv:1401.2389, 2014
Practice and Challenges of Building a Semantic Framework for Chemogenomics Research
  B Chen, DJ Wild. Molecular Informatics 32 (11‐12), 1000-1008, 2014