REFS – Causal Machine Learning and Simulation Platform for drug discovery

My Role

As a Senior Product Manager I led development of causal AI platform through the entire life cycle (from early-stage research through commercialization) for series C through D startup.

Challenge

One significant challenge is identifying subpopulations of patients likely to respond to treatment and uncovering the causal mechanisms of their responses. Users can employ the causal model to run simulations to identify prognostic or predictive biomarkers. However, this method didn’t scale for very large models containing over 20,000 biomarkers. In such cases, users needed to conduct hundreds or even thousands of simulations, sometimes taking up to four to six months.

Solution

To enhance efficiency, I collaborated with data scientists and mathematicians to automate the simulation process. Upon implementing and testing the new algorithm, we discovered that it performed well for small and medium-sized models but frequently failed for very large models. I worked with our QA team on a detailed performance report. Analysis of this report revealed that 70% of the runtime and 90% of the memory usage were consumed by a single function which identifies candidate biomarkers and filters simulation results. I then asked the engineers to compare these results with a prototype from the R&D team. The prototype was significantly faster, and upon reviewing the associated whitepaper prepared by scientists, I realized that they used existing function optimized for large models to search for candidate biomarkers. After sharing these findings with the engineers and data scientists, the team agreed to implement the proposed changes.

Results

  • 400x faster runs for small models,
  • 1700x faster runs for medium models,
  • 19000x faster runs for large and very large models (reduced from 55 hours to 10 seconds).