activities_subtl.png

Scientific machine learning for data-driven discovery

Scientific discovery lacks, by definition, a ground truth. We don't know if the problem is solvable or how well we can do. There rarely are benchmark datasets. Data is missing acutely not at random due to sensor failures and collection bias. A substantial body of previous knowledge needs consideration, such as conservation laws, dynamical equations, and integrity constraints. Prediction is seldom enough: causal understanding is the ultimate goal, and uncertainty evaluation and interpretability are requisites. Data acquisition is mediated not by analytics of web behaviour but by expensive, often unique, experiments, and data modalities are often mixed, sometimes exotic.

We see substantial potential for new developments at the interface between ML and topical research. Besides the abundant algorithmic challenges in scaling, robustness, interpretability, and expression of inductive biases, there are opportunities at the edges of the ML pipeline, i.e. on the steps that are most actionable for domain scientists: problem definition, data collection, feature development, quality evaluation, and formulation of new hypotheses and interventions to address causality.

With its specific challenges, methods, and standards a field is emerging. Some are calling this cross-disciplinary endeavour scientific machine learning.

A colaboratory seeds a community of practice

At the ml ⇌ science colab we seek to develop a community of practice to tackle scientific problems with machine learning methods. For that, we

We are part of a larger effort worldwide toward making machine learning a useful tool to a broader audience — and a more reliable one at that. Steps in that process include developing engineering practices for machine learning [1] and largely automating some of the craft that goes into it [2].

Artificial Intelligence - The Revolution Hasn't Happened Yet

[1] Michael Jordan writes on ML as a human-centric engineering discipline.

[2] Rich Caruana identifies Research opportunities in AutoML.