Variational solutions with neural network

In this lecture, we will use a neural network as a basis to solve the stationary Schrödinger equation for a hydrogen-like atom (or generally a one-electron system). We learned the fundamentals of neural network building and optimization using the the Flax and JAX python libraries in section Fitting of potential energy surface. Here, we'll use this knowledge to model and optimize the one-electron wave function and calculate ground and excited state energies of the hydrogen atom.

The Hamiltonian $\hat{H}$ of a one-electron system consists of the kinetic energy of the electron $\hat{T}$ and the Coulomb interaction potential $V$of the electron with the nucleus or ionic core. Using Cartesian coordinates, the kinetic energy operator (in atomic units) is

\hat{T} = -\frac{1}{2}\left(\frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2} + \frac{\partial^2}{\partial z^2}\right). $$

For hydrogen atom, the electron-nucleus attraction potential is

$$ V(x,y,z) = \frac{1}{\sqrt{x^2+y^2+z^2}}, $$

where $x,y,z$ are electron's Cartesian coordinates.

We may seek for the solution of the Schrödinger equation $\hat{H}\Psi_k = E_k\Psi_k$ by expanding the wave function $\Psi_k$ in a linear combination of some basis functions $\phi_i(x,y,z;\theta)$, which depend on electron's coordinates $x,y,z$ and a set of parameters $\theta$,

$$ \Psi_k(q;\theta)=\sum_i^N c_i^{(k)}\phi_i(q;\theta). $$

From the variational principle it follows that the energies $E=(E_1,E_2,...,E_N)$ and the linear expansion coefficients $C = [(c_1^{(1)},c_2^{(1)},...c_N^{(1)}),(c_1^{(2)},c_2^{(2)},...c_N^{(2)}), ..., (c_1^{(N)},c_2^{(N)},...c_N^{(N)})]$ can be found by solving the eigenvalue problem

$$ E = C^{-1}S^{-1}HC , $$

where

$$ H_{ij} = \int_xdx\int_ydy\int_z dz\phi_i^(x,y,z)\hat{H}\phi_j(x,y,z) = \langle \phi_i|\hat{H}|\phi_j\rangle, \\ S_{ij} =\int_xdx\int_ydy\int_z dz\phi_i^(x,y,z)\phi_j(x,y,z) = \langle \phi_i|\phi_j\rangle, $$

are Hamiltonian and overlap matrix elements. Optimal values of the basis set parameters $\theta$ can also be found using the variational principle, i.e., by minimizing the sum of state energies, i.e.,

$$ \mathcal{L}_{\theta} = \sum_i E_i = \text{Tr}\left( C^{-1}S^{-1}HC \right) = \text{Tr}\left( S^{-1}H \right) \rightarrow \text{min}. $$

Thus, the problem of finding $\theta$ and solving the Schrödinger equation can be cast as an optimization problem using, for example, one of the gradient descent optimization algorithms. In the steepest descend approach, at each iteration we would first compute the Hamiltonian $H_{ij}$ and overlap $S_{ij}$ matrix elements and the loss function $\mathcal{L}_\theta$, and then update the parameters $\theta$ based on the scaled gradient of the loss with respect to parameters, i.e.,

$$ \theta_{i+1} = \theta_i - \alpha\frac{\partial \mathcal{L}_\theta}{\partial\theta}, \\

\frac{\partial \mathcal{L}_\theta}{\partial\theta} = \text{Tr}\left(-S^{-1}HS^{-1}\frac{\partial S}{\partial\theta} + S^{-1}\frac{\partial H}{\partial\theta} \right), $$

where $\alpha$ is a constant scaling parameter (or learning rate).

In the previous lecture Vibrational solutions for molecules, we used products of Hermite and Legendre functions as the basis set, where we also fixed the values of some basis set parameters to physically meaningful values (see, e.g., scaling of Hermite functions). Here, instead of a product of one-dimensional functions, we will employ a simple feed-forward neural network with several hidden layers that take as input the electron's coordinates and output $N$ basis functions, i.e., $\{x_i,y_i,z_i\}_i\rightarrow NN(\theta) \rightarrow\{\phi_k\}_k^N$