Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
acbbullock committed Nov 12, 2022
1 parent 44f3753 commit e40bcd3
Showing 1 changed file with 6 additions and 23 deletions.
29 changes: 6 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Let $V = \\{0,1\\}^n$ be the set of inputs, let $H = \\{0,1\\}^m$ be the set of
= \frac{\exp(a^\perp v + b^\perp h + h^\perp wv)}{\sum_{(v',h') \in X} \exp(a^\perp v' + b^\perp h' + {h'}^\perp wv')} \in [0,1]
$$
where $E(v,h,\alpha) = -a^\perp v - b^\perp h - h^\perp wv$ is the parametrized Boltzmann energy and $Z(\alpha) = \sum_{(v',h') \in X} \exp(a^\perp v' + b^\perp h' + {h'}^\perp wv')$ is the partition function which normalizes the probabilities, with $\perp$ denoting the matrix transpose. From the joint probability distribution $p(\alpha)$, we may construct the marginal distributions as the restrictions $p_V(\alpha):V \to [0,1]$ and $p_H(\alpha): H \to [0,1]$ at $\alpha \in \mathcal{M}$, given by the partial sums
$$p_V(v,\alpha) = \sum_{h \in H} p(v,h,\alpha)~~,~~p_H(h,\alpha) = \sum_{v \in V} p(v,h,\alpha)$$
$$p_V(v,\alpha) = \sum_{h \in H} p(v,h,\alpha)\~\~,\~\~p_H(h,\alpha) = \sum_{v \in V} p(v,h,\alpha)$$
over $H$ and $V$ respectively. Due to the restricted nature of the RBM, the activation probabilities $p(h_i=1|v,\alpha)$ and $p(v_j=1|h,\alpha)$ of each layer are mutually exclusive for all $i \in [1,m]$ and $j \in [1,n]$ such that the conditional probabilities are the products
$$p(h|v,\alpha) = \prod_{i=1}^m p(h_i=1|v,\alpha)~~,~~p(v|h,\alpha) = \prod_{j=1}^n p(v_j=1|h,\alpha)$$
of activation probabilities. The traditional method for training an RBM involves [Hinton](https://en.wikipedia.org/wiki/Geoffrey_Hinton)'s Contrastive Divergence technique, which will not be covered here.
Expand All @@ -27,23 +27,13 @@ of activation probabilities. The traditional method for training an RBM involves
Since the RBM works with Boolean vectors, the RBM is a natural choice for representing wave-functions of systems of spin $\frac{1}{2}$ fermions where each input vector represents a configuration of $n$ spins. Ultimately, we seek to solve the time-independent Schrödinger equation $H\ket{\psi_0} = E_0\ket{\psi_0}$ for the ground state $\ket{\psi_0}$ and its corresponding energy $E_0$ for a given system having Hamiltonian $H$. We take a variational approach by proposing a trial state $\ket{\psi(\alpha)}$ in our $2^n$-dimensional state space $\mathcal{H}$, parametrized by $\alpha \in \mathcal{M}$, and vary $\alpha$ until $\ket{\psi(\alpha)} \approx \ket{\psi_0}$. Letting $S = \\{0,1\\}^n$ be the set of inputs of the RBM, we may choose an orthonormal basis $\\{\ket{s}\\} \subset \mathcal{H}$ labeled by the configurations $s \in S$ such that the trial state is a linear combination $\ket{\psi(\alpha)} = \sum_{s \in S} \psi(s,\alpha) \ket{s} \in \mathcal{H}$, where the components $\psi(s,\alpha) \in \mathbb{C}$ are wave-functions of the parameters.

The trial state wave-functions $\psi$ may be constructed as the marginal distribution on the inputs of the RBM with complex parameters $\alpha \in \\{a,b,w\\} = \mathcal{M}$ where $a \in \mathbb{C}^n$ are the visible layer biases, $b \in \mathbb{C}^m$ are the hidden layer biases, and $w \in \mathbb{C}^{m \times n}$ are the weights which fully connect the layers. With inputs $S = \\{0,1\\}^n$ and outputs $H = \\{0,1\\}^m$, the RBM with complex parameters is a universal approximator of complex probability distributions $\Psi(\alpha):S \times H \to \mathbb{C}$ at $\alpha \in \mathcal{M}$ such that the trial state wave-functions $\psi(\alpha):S \to \mathbb{C}$ at $\alpha \in \mathcal{M}$ are given by the marginal distribution defined by
$$
S \ni s \mapsto \psi(s,\alpha)
= \sum_{h \in H} \Psi(s,h,\alpha)
= \sum_{h \in H} \exp(a^\dagger s + b^\dagger h + h^\dagger ws)
= \exp(a^\dagger s) \sum_{h \in H} \exp(b^\dagger h + h^\dagger ws)
= \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \sum_{h \in H} \exp\bigg(\sum_{i=1}^m b_i^\*h_i + \sum_{i=1}^m h_i \sum_{j=1}^n w_{ij} s_j\bigg)
= \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \sum_{h \in H} \prod_{i=1}^m \exp\bigg(b_i^\*h_i + h_i \sum_{j=1}^n w_{ij} s_j\bigg)
= \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \prod_{i=1}^m \sum_{h_i=0}^1 \exp\bigg(b_i^\*h_i + h_i \sum_{j=1}^n w_{ij} s_j\bigg)
= \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \prod_{i=1}^m \bigg[ 1 + \exp\bigg(b_i^\* + \sum_{j=1}^n w_{ij} s_j\bigg)\bigg] \in \mathbb{C}
$$
$$S \ni s \mapsto \psi(s,\alpha) = \sum_{h \in H} \Psi(s,h,\alpha) = \sum_{h \in H} \exp(a^\dagger s + b^\dagger h + h^\dagger ws) = \exp(a^\dagger s) \sum_{h \in H} \exp(b^\dagger h + h^\dagger ws) = \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \sum_{h \in H} \exp\bigg(\sum_{i=1}^m b_i^\*h_i + \sum_{i=1}^m h_i \sum_{j=1}^n w_{ij} s_j\bigg) = \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \sum_{h \in H} \prod_{i=1}^m \exp\bigg(b_i^\*h_i + h_i \sum_{j=1}^n w_{ij} s_j\bigg) = \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \prod_{i=1}^m \sum_{h_i=0}^1 \exp\bigg(b_i^\*h_i + h_i \sum_{j=1}^n w_{ij} s_j\bigg) = \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \prod_{i=1}^m \bigg[ 1 + \exp\bigg(b_i^\* + \sum_{j=1}^n w_{ij} s_j\bigg)\bigg] \in \mathbb{C}$$
where we ignore the normalization factor of the wave-function, and where $\dagger$ represents the matrix conjugate transpose. By the Born rule, the real, normalized probability distribution $p(\alpha):S \to [0,1]$ associated to the wave-function $\psi$ is defined by $S \ni s \mapsto p(s,\alpha) = |\psi(s,\alpha)|^2/\sum_{s' \in S} |\psi(s',\alpha)|^2 \in [0,1]$.

For the RBM's cost function, we use the statistical expectation $E[\psi(\alpha)] = \langle H \rangle_{\psi(\alpha)}$ of the Hamiltonian $H$ in the variational trial state $\ket{\psi(\alpha)}$, given by
$$E[\psi(\alpha)] = \frac{\langle \psi(\alpha), H\psi(\alpha) \rangle}{\langle \psi(\alpha), \psi(\alpha) \rangle} =
\frac{\sum_{s,s' \in S} \psi^\*(s,\alpha) H_{ss'} \psi(s',\alpha)}{\sum_{s' \in S} |\psi(s',\alpha)|^2} =
\frac{\sum_{s \in S} |\psi(s,\alpha)|^2 \left(\sum_{s' \in S} H_{ss'} \frac{\psi(s',\alpha)}{\psi(s,\alpha)}\right)}{\sum_{s' \in S} |\psi(s',\alpha)|^2} =
\sum_{s \in S} p(s,\alpha) E_{\text{loc}}(s,\alpha)$$
\frac{\sum_{s \in S} |\psi(s,\alpha)|^2 \left(\sum_{s' \in S} H_{ss'} \frac{\psi(s',\alpha)}{\psi(s,\alpha)}\right)}{\sum_{s' \in S} |\psi(s',\alpha)|^2} = \sum_{s \in S} p(s,\alpha) E_{\text{loc}}(s,\alpha)$$
where we define the variational local energies $E_{\text{loc}}(s,\alpha) = \sum_{s' \in S} H_{ss'} \frac{\psi(s',\alpha)}{\psi(s,\alpha)}$, with $H_{ss'}$ being the matrix element of $H$ in between the states $\ket{s}$ and $\ket{s'}$. Thus $E[\psi(\alpha)] = \sum_{s \in S} p(s,\alpha) E_{\text{loc}}(s,\alpha)$ is the statistical expectation of the local energies weighted by the probability distribution $p(\alpha):S \to [0,1]$.

## Transverse Field Ising Model
Expand Down Expand Up @@ -79,15 +69,8 @@ END SUBROUTINE metropolis_hastings

In practice, we allow for a thermalization period, or "burn-in" period, during which the sampling process moves the initial random sample into the stationary distribution before we can begin recording samples. As we can see, the acceptance probabilities in the Metropolis-Hastings algorithm and the form of the local energy involve only ratios of the wave-functions $\psi(s,\alpha)$ for different configurations, and therefore we are justified in ignoring the normalization factor in our derivation of $\psi$. Once all samples are drawn, we may estimate the cost function as an average of the local energies over the drawn samples.

The stochastic optimization algorithm involves modifying the parameters in the direction of the negative gradient of the energy functional in each learning iteration, a form of stochastic gradient descent in the direction of the generalized forces $F_\alpha = -\textrm{grad}_\alpha E[\psi(\alpha)]$ with components
$$
F_{\alpha_l} = - \frac{\partial}{\partial \alpha_l} E[\psi(\alpha)] \approx - \frac{1}{N} \sum_{s \in \tilde{S}} \frac{\partial}{\partial \alpha_l} E_{\text{loc}}(s, \alpha) = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\partial}{\partial \alpha_l} \frac{\psi(s', \alpha)}{\psi(s, \alpha)}
= - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \bigg[ \frac{1}{\psi(s, \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s', \alpha) - \frac{\psi(s', \alpha)}{\psi(s, \alpha)^2} \frac{\partial}{\partial \alpha_l} \psi(s, \alpha) \bigg]
= - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} \bigg[ \frac{1}{\psi(s', \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s', \alpha) - \frac{1}{\psi(s, \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s, \alpha) \bigg]
= - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} \bigg[ \frac{\partial}{\partial \alpha_l} \ln \psi(s', \alpha) - \frac{\partial}{\partial \alpha_l} \ln \psi(s, \alpha) \bigg]
= - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} \bigg[ O_l(s',\alpha) - O_l(s,\alpha) \bigg]
= \frac{1}{N} \sum_{s \in \tilde{S}} \bigg[ O_l(s,\alpha) E_{\text{loc}}(s, \alpha) - \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} O_l(s',\alpha) \bigg]
$$
The stochastic optimization algorithm involves modifying the parameters in the direction of the negative gradient of the energy functional in each learning iteration, a form of stochastic gradient descent in the direction of the generalized forces $F_\alpha = -\textrm{grad}\_\alpha E[\psi(\alpha)]$ with components
$$F_{\alpha_l} = - \frac{\partial}{\partial \alpha_l} E[\psi(\alpha)] \approx - \frac{1}{N} \sum_{s \in \tilde{S}} \frac{\partial}{\partial \alpha_l} E_{\text{loc}}(s, \alpha) = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\partial}{\partial \alpha_l} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \bigg[ \frac{1}{\psi(s, \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s', \alpha) - \frac{\psi(s', \alpha)}{\psi(s, \alpha)^2} \frac{\partial}{\partial \alpha_l} \psi(s, \alpha) \bigg] = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} \bigg[ \frac{1}{\psi(s', \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s', \alpha) - \frac{1}{\psi(s, \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s, \alpha) \bigg] = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} \bigg[ \frac{\partial}{\partial \alpha_l} \ln \psi(s', \alpha) - \frac{\partial}{\partial \alpha_l} \ln \psi(s, \alpha) \bigg] = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} \bigg[ O_l(s',\alpha) - O_l(s,\alpha) \bigg] = \frac{1}{N} \sum_{s \in \tilde{S}} \bigg[ O_l(s,\alpha) E_{\text{loc}}(s, \alpha) - \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} O_l(s',\alpha) \bigg]$$
where we define the logarithmic derivatives
$$O_l(s,\alpha) = \frac{\partial}{\partial \alpha_l} \ln \psi(s, \alpha) = \frac{1}{\psi(s, \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s, \alpha)$$
of the variational wave-functions $\psi(s, \alpha)$ in terms of diagonal operators $O_l$ defined by $O_l \psi(s, \alpha) = O_l(s,\alpha)$. In the final equality of $F_{\alpha_l}$, the second term is a modified local energy where each term of the summation is weighted by the logarithmic derivatives $O_l(s',\alpha)$ for each $s' \in S$. By making a further approximation
Expand All @@ -102,7 +85,7 @@ for some learning rate $\eta > 0$.

## Stochastic Reconfiguration

To overcome typical problems in the vanilla stochastic optimization, we seek to improve performance of the algorithm by pre-conditioning the gradient $F_\alpha$ with a Hermitian positive-definite matrix $S^{-1}(\alpha)$ prior to updating the parameters $\alpha \in \\{a,b,w\\}$, such that the update rule becomes
To overcome typical problems in the vanilla stochastic optimization, we seek to improve performance of the algorithm by pre-conditioning the gradient $F_\alpha$ with a Hermitian positive-definite matrix $S^{-1}(\alpha)$ prior to updating the parameters $\alpha \in \mathcal{M}$, such that the update rule becomes
$$\alpha ← \alpha + \eta S^{-1}(\alpha) F_\alpha$$
where the matrix $S(\alpha)$ is known as the stochastic reconfiguration matrix. Choosing $S(\alpha)$ as the identity recovers the ordinary stochastic optimization. A sophisticated choice for $S(\alpha)$ is the Quantum Geometric Tensor whose components are defined as the expectation covariances
$$S_{kl}(\alpha) = \langle O_k^\dagger O_l \rangle_{\psi(\alpha)} - \langle O_k^\dagger \rangle_{\psi(\alpha)} \langle O_l \rangle_{\psi(\alpha)} = \frac{\langle O_k \psi(\alpha), O_l \psi(\alpha) \rangle}{\langle \psi(\alpha), \psi(\alpha) \rangle} - \frac{\langle O_k \psi(\alpha), \psi(\alpha) \rangle}{\langle \psi(\alpha), \psi(\alpha) \rangle} \frac{\langle \psi(\alpha), O_l \psi(\alpha) \rangle}{\langle \psi(\alpha), \psi(\alpha) \rangle} = \frac{\sum_{s \in S} O_k^\*(s,\alpha) O_l(s,\alpha)}{\sum_{s' \in S} |\psi(s',\alpha)|^2} - \bigg[ \frac{\sum_{s \in S} O_k^\*(s,\alpha) \psi(s,\alpha)}{\sum_{s' \in S} |\psi(s',\alpha)|^2} \bigg] \bigg[ \frac{\sum_{s \in S} \psi^\*(s,\alpha) O_l(s,\alpha)}{\sum_{s' \in S} |\psi(s',\alpha)|^2} \bigg] \approx \frac{1}{N} \sum_{s \in \tilde{S}} O_k^\*(s,\alpha) O_l(s,\alpha) - \bigg[ \frac{1}{N} \sum_{s \in \tilde{S}} O_k^\*(s,\alpha) \bigg] \bigg[ \frac{1}{N} \sum_{s \in \tilde{S}} O_l(s,\alpha) \bigg]$$
Expand Down

0 comments on commit e40bcd3

Please sign in to comment.