The aim of this lecture is to introduce some active learning algorithms for building potential-energy surfaces (PESs) of molecules with reduced computational costs. At the end of the lecture you will:
Many results of this lecture are based on this recently-published article.
Codes for this tutorial are available at: ‣.
We have seen in the previous lecture that building a potential-energy surface of molecules (PES) involves two steps:
$$ \underset{\theta}{\text{min }} \sum_{ (X_i, E_i) \in \mathcal{D}} \mathcal{L} (g(x_i;\theta), E_i) $$
where $\\mathcal{L}$, in the case of building PESs, is often the root-mean-squared error.
The following diagram summarises this process:
However, the first task in this process is not always trivial. The process of selecting the molecular geometries requires a physical insight into the problem. More importantly, the electronic-energies can be computationally expensive to compute if one requires high accuracy. The following table shows how badly these computations scale as a function of the size of the molecular system:
where accuracy here is the accuracy of the (ro-)vibrational calculations.
Thus, when building PESs, one would want to reduce the number of training data needed to achieve a certain accuracy. This is the essential problem in active learning. AL can be understood as a double optimisation problem; over the set of all possible parameters, and over the set of all possible datasets, i.e.,
$$ \underset{\theta, \mathcal{D}}{\text{min }} \sum_{ (X_i, E_i) \in \mathcal{D}} \mathcal{L} (g(x_i;\theta), E_i) $$
There are several kinds of active-learning algorithms, but we'll only be looking at pool-based active learning.