How knowledge graphs will transform drug discovery

It’s time to get serious about knowledge graphs in drug discovery. If depressing projections are to be believed (and there is strong support that they are), the pharmaceutical industry is in a terminal decline, with return on investment (ROI) projected to hit zero by 2020. Pharmaceutical companies have made some heroic efforts to plug the holes in the hull of the Titanic, but the fact remains that the returns on new drugs that do get to market do not justify the massive investments that Pharma currently puts into R&D.

Yet, there is much needing to be done in drug discovery. Despite some recent successes in immuno-oncology and gene therapy, our treatments for the most prevalent, devastating diseases like cancer and diabetes are still not that great, and there are many more conditions like Alzheimer’s that have no good treatments at all, not to mention the hundreds of rare and neglected diseases that need treatments. As we used to say when I worked at Parke-Davis, “the patient is waiting”.

So how do we save drug discovery and get back to helping the patient? Well, as has been said many times before, we have to find a way to make drug discovery much leaner, faster, and effective, and that means radically rethinking the processes, assumptions and technologies of drug discovery, and most importantly how we use data and knowledge effectively. Current drug discovery processes are predicated on letting a sequence of expert scientists (chemists, biologists, toxicologists, and so on) use their deep knowledge to drive experimental investigations through a well-trodden process of target identification, lead compound identification, ADME/toxicology, animal studies, and so on. Most of these efforts fail, but the investigations result in mountains of experimental databases, papers, documents, spreadsheets, with modern experimental technologies such as high throughput screening and genomics delivering millions of data points for every project. Key decisions, insights, and observations are neatly organized into PowerPoints and then forgotten. Then there are the petabytes of experimental and patient data being made available in public or commercial data sources, as well as the well over a million life sciences publications that come out each year. It is not too extreme to say that if pharma companies are going to survive, they need to stop being drug companies and start being data science companies.

What if we could bring all that knowledge, data, insight, and prior decision-making together and use it to accelerate the discovery of new drugs? What if we could encode the millions of known relationships between potential new (or old) drugs, protein targets, genomics, biological processes, and disease mechanisms, and then use all this together to get new insights into disease and treatments? What if we could encode scientific decisions? What if we could even map translational relationships that bring all the scientific molecular data together with data from patients and clinical trials? What if we could partner this huge map of knowledge with powerful AI and machine learning algorithms that can prioritize insights and connections for the expert human scientists to assess? What if we could actually do data-driven drug discovery that gets drugs to patients quicker and faster?

Well we can do this.

Since 2008, I and my colleagues at Indiana University have been researching ways to link together massive amounts of heterogeneous drug discovery data and knowledge into computable graph structures, which we now call knowledge graphs, and we have designed new powerful algorithms to run on top of these knowledge graphs. We have already done some pretty exciting things like predicting the biological activities of drugs, mining patterns to explain side effects, and identifying new patterns of relationships between diseases. In 2010, the OpenPHACTS consortium brought together the knowledge and insight from pharmaceutical companies and academia to demonstrate how drug companies and academia can collaborate to combine knowledge into a linked, searchable network. Our partner and the successor to the consortium, the OpenPHACTS Foundation, will soon be ready to release a highly accessible, interoperable, sustainable knowledge graph of public drug discovery data that can be harvested and reused in many ways. In 2012, we launched Data2Discovery, one of the earliest AI for drug discovery startups. Data2Discovery is, with customers and partners, building knowledge graphs that transcend the boundaries of traditional public/proprietary data silos and which power completely new AI-driven applications. We are able to improve drug discovery now as well as demonstrating new fast-cycle AI-driven processes that will have a revolutionary impact on drug discovery if fully implemented. We have had some dramatic successes, but we are just starting to discover the impact that data, knowledge graphs, AI and machine learning can together have on drug discovery.

We need all the expertise of academics, consortia, AI companies and pharma to make his happen, and it’s going to require some serious investment, and a big change of thinking. But the opportunity to get drug discovery out of the death spiral and framed for data-driven success is too important to pass up. The patient is waiting.

Screen Shot 2019-02-19 at 2.36.26 PM.png