What a rough few weeks it’s been for AI and healthcare. In just the last ten days, we’ve seen the publication of a number of commentaries that collectively express a significant degree of caution, if not outright concern, about the extravagant expectations around the application of AI to healthcare and drug discovery.
Let’s start with the two peer-reviewed publications, both in the journal Digital Medicine.
Tech Solutions Can’t Fix Social/Political Problems
The first DM paper centers around “The ‘inconvenient truth’ about AI in healthcare” – basically, the disconnect between promising research algorithms and the activity of frontline clinicians. (The authors are Trishan Panch, Heather Mattie, and Leo Anthony Celi; all three are associated with Harvard. In addition, Panch is the CMO of the company Wellframe, Mattie is affilitated with Welframe, and Celi is also associated with MIT.) Panch et al argue the disconnect stems from two factors:
- AI innovations don’t fix warped incentives – i.e. you can’t expect to fix a social/political problem with a technical solution
- Inadequate data structures to train AI algorithms – i.e. it’s the data, stupid.
The authors highlight the potential of cloud computing, and argue the opportunity to benefit from this capability is thwarted by EMRs. “Clinician satisfaction with EMRs remains low,” they write, “resulting in variable completeness and quality of data entry, and interoperability between different providers remains elusive.” Abandoning their charming understatement, they continue, “The typical lament of a harried clinician is still ‘why does my EMR still suck and why don’t all these systems just talk to each other?’”
Despite the benefits of improved data liquidity, this hasn’t happened yet, the authors argue, because “a sufficiently compelling use case has not yet been presented to overcome the significant upfront investment necessary to build data infrastructure.” Plus, they say, the skills required to get this done is “beyond the core competencies of either healthcare organizations or governments.”
While the authors highlight the potential benefits of moving clinical data to the cloud, and utilizing a common data schema, they seem deeply skeptical about it actually happening, due to the power of entrenched incumbents, commenting: “A notable by-product of a move of clinical as well as research data to the cloud would be the erosion of market power of EMR providers. The status quo with proprietary data formats and local hosting of EMR databases favors incumbents who have strong financial incentives to maintain the status quo.” This is especially unfortunate, they suggest, because “Creation of health data infrastructure opens the door for innovation and competition with the public sector to fulfill the public aim of interoperable health data.”
You Can’t Fix Foundationally Flawed Data With High Volume
Another relevant DM paper, published online three days after the Panch paper, focuses on the concerns that, essentially, biased datasets will train biased algorithms to generate clinical recommendations that “potentially exacerbate health disparities.” (The authors are Eli Cahan, Tina Hernandez-Boussard, Sonoo Thadaney-Israni, and Daniel Rubin, all associated with Stanford; Cahan is also associated with New York University).
Cahan et al. are particular concerned about bias and disparities, but the issues they raise would seem to generalize to all conclusions based on fragile datasets. “Data are not necessarily useful simply because they are voluminous,” they write. “As stated by [Amaud] Chiolero [here], ‘big data’ do not speak by themselves any more than ‘small data.’ Acceptance of the veracity of data inputs on account of volume overlooks the hazardous underbelly of volume, in its ability to amplify falsity. Even for big data, ‘nothing is too big to fail.’”
The authors advocate evolving from what they call the current paradigm, using deductive reasoning from big data, towards a new paradigm, involving inductive reasoning. In other words, the current model is basically our current view of clinical decision support: an algorithm receives inputs and delivers outputs, such as whether to order a particular diagnosis test. The proposed new model, the authors suggest, can “be thought of as clinical decision questioning” where “few predictions enter and many questions exit,” and data analytics are used in the “recognition, and illumination, of false positives and false negatives.”
In short, they describe a major opportunity in (and urgent need for) identifying deficiencies in how we collect data and in the conclusions we tend to draw from these data.
(Recommended pairing: This NPR interview with Weapons of Math Destruction author Cathy O’Neil.)
Why Creating Labels for Biological Systems Is Really Hard
A third highly relevant piece is a blog at DrugDiscovery.net written by chemical biology data scientist Andreas Bender, entitled (provocatively), “Why AI and Drug Discovery are no match made in heaven.” Bender argues (stop me if you’ve heard this before…) “AI needs data, and this is the weak point when trying to apply ‘AI’ to the drug discovery field.”
For starters, Bender notes that “What we really care about when doing drug discovery are the in vivo results – we don’t want to treat a protein with a drug, or a cell line, or a rat; we want to achieve efficacy, with tolerable toxicity, in humans. (emphasis in original).”
Bender appropriately contrasts the data availability in early phases and late phases of drug discovery and development, and observes, “In early phases, we have more data, which is more clearly labeled – but it is less relevant to in vivo outcomes, such as efficacy. In late phases, we have data that is more relevant to in vivo outcomes, but we have very little data available in general.”
I thought Bender’s next point was especially astute, and perhaps what’s most often underappreciated by data scientists without domain expertise in biology and chemistry. Bender notes that in contrast to a customer clicking a link, then buying a product, where the connection is pretty direct, the situation in biology and drug development is so much more complicated – and highly contingent. Bender writes,
“Whether a drug shows efficacy in a disease (or toxic side effects) depends at the very least, on dose, route of delivery, and individual genetic setup of he organism and the disease…There is no clear label one can assign, such as ‘drug X treats disease Y’ – yes, sometimes, but sometimes not, depending on the context of how and in which context the drug is applied to a particular organism. Hence labels in the biological domain are generally much more ambiguous, and context-dependent than in other domains.”
Bender goes on to also point out that many of the so-called early successes around “AI in drug discovery” are important but also “a good number of steps away from the more difficult biological and in vivo stages, where efficacy and toxicity in living organisms decides the fate of drugs waiting to be discovered. Hence, there is still a gap that needs to be bridged…”
The key issue, he concludes, is the need for “sufficient and sufficiently relevant data in order to predict properties of potential therapies that are relevant for the in vivo situation, which are related to efficacy and toxicity-relevant endpoints.”
(Recommended pairing: this Derek Lowe commentary.)
(AI) Winter Is Coming?
There were two additional relevant articles this week in Stat. The first, by Casey Ross, features an interview with Gary Marcus, “a tech entrepreneur, author, and psychology professor at New York University,” according to Ross. Marcus is described as a deep learning skeptic, a theme he addresses in a forthcoming book, Ross says, adding that Marcus worries that failure to modulate expectations could lead to another “AI Winter.”
In essence, Marcus suggests he’s a fan of AI, but not of its misrepresentation; he describes deep learning as “kind of a turbocharged version of memorization,” so helpful in some contexts, less helpful in others. This struck me as very much aligned with the view described in David Epstein’s Range, where deep learning is said to be really effective in “kind” environments, but not in “wicked” ones (see my recent discussion of Range for more details).
Marcus believes clinical data could be really valuable, were it available for analysis at scale (again: stop me if you’ve heard this before), but this doesn’t really happen and, he suggests, isn’t likely to occur without government intervention. “All the individual hospitals are doing what’s in their own individual financial self-interest, and it’s not in the interests of their patients,” he writes, adding:
“You need scientific empirical data in order to figure out what works and what doesn’t. But you have data distributed between 20 or 500 different hospitals, and there’s no good way of sharing it at any price…It’s not a good thing for humanity that data are so fractious and so tied into the finances, rather than making people better.”
(Recommended pairing: this discussion of Range.)
YOU get an AI/drug dev co and YOU get an AI/drug dev co and …
A related Stat article, by Rebecca Robbins, looks at the increasing number of startups trying to bring AI to drug development – at least 148, according to an expert Robbins quotes; there are so many, in fact that apparently they are stumbling over each other’s names – Robbins cites Curai and AICure, also HelixAI and Healx (not to be confused with genetics company Helix, as she points out).
A more substantial problem they’re running into, Robbins says, is that many of these companies may grok AI but don’t really grok pharma. She quotes the high-profile co-founder of ARCH Venture Partners, Robert Nelsen (a VC with strong biotech background increasingly investing in AI/drug development companies such as insitro) as saying “A lot of these tech companies just have no idea how the business even works.”
Another problem, Robbins says, is that startups find themselves competing with internal teams pursuing the same work. She describes several of these internal teams as “formidable,” a perspective that I’d suggest aligns far better with these teams’ self-image than the view of many (most?) external experts, and would also seem to contrast with the (general lack of meaningful) success demonstrated to date. But it’s absolutely true that the perception of internal competence – and a sense of feeling threatened by external efforts – is very real and very common, especially in areas where resources have already been deployed (a pharma phenomenon hardly unique to AI…).
Perhaps most interesting (as I’ve discussed frequently both in this column and on Tech Tonics) is the phenomenon of some very well-resourced “AI for drug discovery” companies, like insitro and Recursion (as Robbins notes) building out their own, internal high-quality datasets suitable for machine learning – reflecting yet again the need for suitable data to train algorithms.
For the last several years, I’ve highlighted in this column the gap between the lofty promises AI entrepreneurs are pitching and the skepticism with which healthcare organizations and bipharma companies are greeting this glorious news. The narrative that benighted incumbent organizations characteristically fail to appreciate and embrace the brillliant future that beckons may be appealing, but we should also consider the possibility that their caution and hesitation may be warranted. Most of the authors and experts discussed above are long-term bullish on AI in healthcare and biopharma; what we need next is to move from theoretic benefit and evangelical sales to established use cases and robust, clinically-relevant data. It would seem especially important not to conflate data volume with data quality, and while the total amount of healthcare data may be high, the actual utility of these data (beyond billing, of course — the intended purpose of collection) — even were it possible to bring them together — remains an important unknown.
Forbes, on need to focus on implementation.
Clinical Pharmacology and Therapeutics commentary on implementation.
Forbes, on how AI can mislead biomarker researchers
Forbes, on the challenges of implementing a data strategy in pharma
Forbes, on pharma and the data science mindset
Forbes, on pharma’s struggle to teach old data new tricks
Forbes, on sociology of data silos in pharma R&D