Indiana University Integrative Data Science Lab (IDSL)

Welcome to the Integrative Data Science Lab (IDSL) in the School of Informatics and Computing at Indiana University. Integrative Data Science brings together heterogeneous datasets and expertise from different disciplines along with novel data science tools and technologies to solve real world problems. Current research focus areas include Precision Drug Intervention and Smart Communities, Health and Emergency Response. The IDSL is directed by David Wild.

Integrative Data Science for Precision Drug INTERVENTION

A major focus of modern medicine is understanding the molecular bases of disease states and their relationship to particular patient groups and external conditions. Using deep expertise in cheminformatics, biomedical informatics, data linking and heterogeneous graph mining of molecular and patient-level data, we are building tools and methods that identify opportunities to use new chemical entities and to repurpose existing drugs as precision medical interventions. Current projects include identifying repurposing candidates for rare and neglected diseases, using new datasets including gene expression data for repurposing, and creation of targeted interventions by new chemical entities using automated synthesis laboratories. We are thankful to NIH NCATS, Indiana CTSI, the OpenPHACTS foundation, Eli Lilly, and Pfizer for funding of this work.

Integrative Data Science for Smart COMMUNITIES, Health and Emergency Response

We are researching the intersection of data science, smart communities, human computer interaction and heterogeneous data sources to bring tools and actionable data to healthcare, emergency responders, and citizens to enable efficient, informed and cost-effective decision making. Current projects include a partnership with Bloomington City Fire Department in Indiana for data driven situational awareness tools and historical call data mining, and a citywide analysis of drug prescribing and adverse drug interactions in a large city in Brazil.


Selected Tools and Resources from the IDSL

The following tools and resources are some of the most widely used from prior IDSL projects. Additional tools and resources include the T2DM-NET knowledge network for diabetes, the NCATS Phenotypic Drug Discovery Resource, and the ChemBioSpace drug/gene/disease/side-effect association finding tool



SLAP is a tool that will profile drugs against targets (and vice versa) using semantically linked networks of information on compounds, genes, pathways, and related information. It can be used in numerous ways including predicting on- and off-target interactions, drug repurposing, and identification of mechanisms of action. For more information, see our PLoS Compuational Biology paper. You can also try the tool by clicking the link to the left. A related tool call SEMAP is available for commercial use from Data2Discovery Inc.



Chem2Bio2RDF is a demonstration of the power of semantically linked data. It shows how numerous publicly available biomedical datasets can be linked together and used to answer important biomedical questions that would otherwise be very difficult to answer, such as identifying multiple pathway inhibitors and associating drugs with particular side effects. You can read more in our BMC Bioinformatics paper, or access the resource by clicking the link to the left. Note that this was a proof-of-concept project and the data is not updated. However, Chem2Bio2RDF was drawn into the OpenPHACTS project which maintains frequently updated data.



NetPredictor is an open source R package for prediction of missing links in any given bipartite network. The package provides utilities to compute missing links in bipartite and unipartite networks using Random Walk with Restart and a network inference algorithm. The package also allows computation of bipartite network properties, visualization of communities for two different sets of nodes, and calculation of significant interactions between two sets of nodes using permutation based testing. It includes an example application written in R-Shiny for prediction of drug-target associations. You can read more in our bioRxiv paper, or access source code and documentation in the github repository by clicking the image to the left.


Former members

Dr Rajarshi Guha (Vis. Asst. Prof), NIH NCATS
Dr Qian Zhu (Postdoc), University of Maryland
Dr Xiao Dong (PhD), University of Illinois, Chicago
Dr Huijun Wang (PhD), Merck
Dr Pulan Yu (PhD), Dow Agrosciences
Dr Bin Chen (PhD), University of California San Francisco
Dr Hari Machina (PhD), Roche
Dr Abhik Seal (PhD), Abbvie
Dr Jae Hong Shin (PhD) University of Texas
Dr Varsha Kulkarni (PhD)


Dr David Wild, Director
Samuel Bentum, Ph.D. Student
Alex Christou, Ph.D. Student
Natalie Franklin, Ph.D. Student
Stefan Furrer, Ph.D. Student
Chris Gessner, Ph.D. Student
Anurag Passi, (PhD Fulbright Fellow), AcSIR
Jeremy Yang, Ph.D. Student

LATEST Presentations & Publications


Big Data in Drug Discovery and Development
    Conference on Information and Knowledge Management (CIKM) 2016, Indianapolis, Indiana, Oct 2016.
Transforming Pharmaceutical and Healthcare Companies into Data Companies
  Indiana SOIC Research Horizons, Bloomington, Indiana, Sep, 2016. See also Presentation Recording.
Opportunities for semantic and network methods in Phenotypic Drug Discovery
    OpenPHACTS Phenotypic Screening Workshop, Santiago de Compostela, Spain, Feb. 2015.
Applying semantic and network methods in AOP knowledge discovery
    Adverse Outcome Pathways: From Research to Regulation. NIH NIEHS Workshop, Bethesda, Maryland, Sep. 2014
Opportunities in toxicology for large scale semantic linked data and prediction
    Society of Toxicology 53rd Annual Meeting, Phoenix, Arizona, March 2014
Social media in cheminformatics education
    American Chemical Society National Meeting, Indianapolis, Indiana, September 2013. 
Large scale cross dataset mining of chemical and biological datasets for drug discovery
    American Chemical Society CINF Webinar, May 2013 (Link to CINF site with recording)
New opportunities for biomedical science and drug discovery using semantic technologies
    Exploiting Big Data Semantics for Translational Medicine, Indiana University, Bloomington, March 2013.  
Semantic integration, search and prediction on drug discovery data at Indiana University
    Leiden University Medical Center, July 2012. 
Assessing drug target association using semantic linked data
    American Chemical Society National Meeting, San Diego, March 2012. See also Presentation Recording
Exploiting semantic networks of public data for systems chemical biology
    American Chemical Society National Meeting, San Diego, March 2012. See also Presentation Recording
New searching paradigms in drug discovery enabled by semantic integration of public data
    American Chemical Society National Meeting, San Diego, March 2012. See also Presentation Recording
Exploiting semantic networks of public data for drug-target prediction
   Open Source for Computer Aided Translational Medicine Conference, Chandigarh, India, February 2012. 


For more publications, see David Wild's Google Scholar page.

An activity canyon characterization of the pharmacological topography
  VS Kulkarni, DJ Wild. Journal of Cheminformatics 8 (1), 41, 2016
Netpredictor: R and Shiny package to perform Drug-Target Bipartite network analysis and prediction of missing links.
  A Seal, DJ Wild bioRxiv, 080036, 2016
Data Science and Online Education
  G Fox, S Maini, H Rosenbaum, D Wild. 2015 IEEE 7th International Conference on Cloud Computing Technology and Science  
  (CLOUDCOM), 2015
Optimizing drug–target interaction prediction based on random walk on heterogeneous networks
  A Seal, YY Ahn, DJ Wild. Journal of cheminformatics 7 (1), 1, 2015
Novel phenotypic outcomes identified for a public collection of approved drugs from a publicly accessible panel of assays
  JA Lee, P Shinn, S Jaken, S Oliver, FS Willard, S Heidler, RB Peery, ... PloS one 10 (7), e0130796, 2015
Semantic Breakthrough in Drug Discovery
  B Chen, H Wang, Y Ding, D Wild. Synthesis Lectures on the Semantic Web: Theory and Technology 4 (2), 1-142, 2015
Applications of the YarcData Urika in Drug Discovery and Healthcare
  R Henschel, A Seal, JJ Yang, DJ Wild, Y Ding, A Thota, S Michael, ... Proceedings of the Cray User Group, 2015.
A possible gut microbiota basis for weight gain side effects of antipsychotic drugs
  H Joshi, A Parihar, D Jiao, S Murali, DJ Wild. arXivarXiv:1401.2389, 2014
Practice and Challenges of Building a Semantic Framework for Chemogenomics Research
  B Chen, DJ Wild. Molecular Informatics 32 (11‐12), 1000-1008, 2014