Indiana University Integrative Data Science Lab (IDSL)

Welcome to the Integrative Data Science Lab (IDSL) in the School of Informatics and Computing at Indiana University. Integrative Data Science brings together heterogeneous datasets and expertise from different disciplines along with novel data science tools and technologies to solve real world problems. Current research focus areas include Precision Drug Intervention and Smart Communities, Health and Emergency Response. The IDSL is directed by David Wild.

Integrative Data Science for Precision Drug INTERVENTION

A major focus of modern medicine is understanding the molecular bases of disease states and their relationship to particular patient groups and external conditions. Using deep expertise in cheminformatics, biomedical informatics, data linking and heterogeneous graph mining of molecular and patient-level data, we are building tools and methods that identify opportunities to use new chemical entities and to repurpose existing drugs as precision medical interventions. Current projects include identifying repurposing candidates for rare and neglected diseases, using new datasets including gene expression data for repurposing, and creation of targeted interventions by new chemical entities using automated synthesis laboratories. We are thankful to NIH NCATS, Indiana CTSI, the OpenPHACTS foundation, Eli Lilly, and Pfizer for funding of this work.

Integrative Data Science for Smart COMMUNITIES, Health and Emergency Response

We are researching the intersection of data science, smart communities, human computer interaction and heterogeneous data sources to bring tools and actionable data to healthcare, emergency responders, and citizens to enable efficient, informed and cost-effective decision making. Current projects include a partnership with Bloomington City Fire Department in Indiana for data driven situational awareness tools and historical call data mining, and a citywide analysis of drug prescribing and adverse drug interactions in a large city in Brazil.


Selected Tools and Resources from the IDSL

The following tools and resources are some of the most widely used from prior IDSL projects. Additional tools and resources include the T2DM-NET knowledge network for diabetes, the NCATS Phenotypic Drug Discovery Resource, and the ChemBioSpace drug/gene/disease/side-effect association finding tool



SLAP is a tool that will profile drugs against targets (and vice versa) using semantically linked networks of information on compounds, genes, pathways, and related information. It can be used in numerous ways including predicting on- and off-target interactions, drug repurposing, and identification of mechanisms of action. For more information, see our PLoS Compuational Biology paper. You can also try the tool by clicking the link to the left. A related tool call SEMAP is available for commercial use from Data2Discovery Inc.



Chem2Bio2RDF is a demonstration of the power of semantically linked data. It shows how numerous publicly available biomedical datasets can be linked together and used to answer important biomedical questions that would otherwise be very difficult to answer, such as identifying multiple pathway inhibitors and associating drugs with particular side effects. You can read more in our BMC Bioinformatics paper, or access the resource by clicking the link to the left. Note that this was a proof-of-concept project and the data is not updated. However, Chem2Bio2RDF was drawn into the OpenPHACTS project which maintains frequently updated data.



NetPredictor is an open source R package for prediction of missing links in any given bipartite network. The package provides utilities to compute missing links in bipartite and unipartite networks using Random Walk with Restart and a network inference algorithm. The package also allows computation of bipartite network properties, visualization of communities for two different sets of nodes, and calculation of significant interactions between two sets of nodes using permutation based testing. It includes an example application written in R-Shiny for prediction of drug-target associations. You can read more in our bioRxiv paper, or access source code and documentation in the github repository by clicking the image to the left.


Former members

Dr Rajarshi Guha (Vis. Asst. Prof), NIH NCATS
Dr Qian Zhu (Postdoc), University of Maryland
Dr Xiao Dong (PhD), University of Illinois, Chicago
Dr Huijun Wang (PhD), Merck
Dr Pulan Yu (PhD), Dow Agrosciences
Dr Bin Chen (PhD), University of California San Francisco
Dr Hari Machina (PhD), Roche
Dr Abhik Seal (PhD), Abbvie
Dr Jae Hong Shin (PhD) University of Texas
Dr Varsha Kulkarni (PhD)


Dr David Wild, Director
Samuel Bentum, Ph.D. Student
Alex Christou, Ph.D. Student
Natalie Franklin, Ph.D. Student
Stefan Furrer, Ph.D. Student
Chris Gessner, Ph.D. Student
Anurag Passi, (PhD Fulbright Fellow), AcSIR
Jeremy Yang, Ph.D. Student

For more publications, see David Wild's Google Scholar page.

