Update README.md

acbbullock · Nov 12, 2022 · e40bcd3 · e40bcd3
1 parent 44f3753
commit e40bcd3
Showing 1 changed file with 6 additions and 23 deletions.
diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@ Let $V = \\{0,1\\}^n$ be the set of inputs, let $H = \\{0,1\\}^m$ be the set of
  = \frac{\exp(a^\perp v + b^\perp h + h^\perp wv)}{\sum_{(v',h') \in X} \exp(a^\perp v' + b^\perp h' + {h'}^\perp wv')} \in [0,1]
  $$
 where $E(v,h,\alpha) = -a^\perp v - b^\perp h - h^\perp wv$ is the parametrized Boltzmann energy and $Z(\alpha) = \sum_{(v',h') \in X} \exp(a^\perp v' + b^\perp h' + {h'}^\perp wv')$ is the partition function which normalizes the probabilities, with $\perp$ denoting the matrix transpose. From the joint probability distribution $p(\alpha)$, we may construct the marginal distributions as the restrictions $p_V(\alpha):V \to [0,1]$ and $p_H(\alpha): H \to [0,1]$ at $\alpha \in \mathcal{M}$, given by the partial sums
- $$p_V(v,\alpha) = \sum_{h \in H} p(v,h,\alpha)~~,~~p_H(h,\alpha) = \sum_{v \in V} p(v,h,\alpha)$$
+ $$p_V(v,\alpha) = \sum_{h \in H} p(v,h,\alpha)\~\~,\~\~p_H(h,\alpha) = \sum_{v \in V} p(v,h,\alpha)$$
 over $H$ and $V$ respectively. Due to the restricted nature of the RBM, the activation probabilities $p(h_i=1|v,\alpha)$ and $p(v_j=1|h,\alpha)$ of each layer are mutually exclusive for all $i \in [1,m]$ and $j \in [1,n]$ such that the conditional probabilities are the products
  $$p(h|v,\alpha) = \prod_{i=1}^m p(h_i=1|v,\alpha)~~,~~p(v|h,\alpha) = \prod_{j=1}^n p(v_j=1|h,\alpha)$$
 of activation probabilities. The traditional method for training an RBM involves [Hinton](https://en.wikipedia.org/wiki/Geoffrey_Hinton)'s Contrastive Divergence technique, which will not be covered here.
@@ -27,23 +27,13 @@ of activation probabilities. The traditional method for training an RBM involves
 Since the RBM works with Boolean vectors, the RBM is a natural choice for representing wave-functions of systems of spin $\frac{1}{2}$ fermions where each input vector represents a configuration of $n$ spins. Ultimately, we seek to solve the time-independent Schrödinger equation $H\ket{\psi_0} = E_0\ket{\psi_0}$ for the ground state $\ket{\psi_0}$ and its corresponding energy $E_0$ for a given system having Hamiltonian $H$. We take a variational approach by proposing a trial state $\ket{\psi(\alpha)}$ in our $2^n$-dimensional state space $\mathcal{H}$, parametrized by $\alpha \in \mathcal{M}$, and vary $\alpha$ until $\ket{\psi(\alpha)} \approx \ket{\psi_0}$. Letting $S = \\{0,1\\}^n$ be the set of inputs of the RBM, we may choose an orthonormal basis $\\{\ket{s}\\} \subset \mathcal{H}$ labeled by the configurations $s \in S$ such that the trial state is a linear combination $\ket{\psi(\alpha)} = \sum_{s \in S} \psi(s,\alpha) \ket{s} \in \mathcal{H}$, where the components $\psi(s,\alpha) \in \mathbb{C}$ are wave-functions of the parameters.
 
 The trial state wave-functions $\psi$ may be constructed as the marginal distribution on the inputs of the RBM with complex parameters $\alpha \in \\{a,b,w\\} = \mathcal{M}$ where $a \in \mathbb{C}^n$ are the visible layer biases, $b \in \mathbb{C}^m$ are the hidden layer biases, and $w \in \mathbb{C}^{m \times n}$ are the weights which fully connect the layers. With inputs $S = \\{0,1\\}^n$ and outputs $H = \\{0,1\\}^m$, the RBM with complex parameters is a universal approximator of complex probability distributions $\Psi(\alpha):S \times H \to \mathbb{C}$ at $\alpha \in \mathcal{M}$ such that the trial state wave-functions $\psi(\alpha):S \to \mathbb{C}$ at $\alpha \in \mathcal{M}$ are given by the marginal distribution defined by
- $$
- S \ni s \mapsto \psi(s,\alpha)
- = \sum_{h \in H} \Psi(s,h,\alpha)
- = \sum_{h \in H} \exp(a^\dagger s + b^\dagger h + h^\dagger ws)
- = \exp(a^\dagger s) \sum_{h \in H} \exp(b^\dagger h + h^\dagger ws)
- = \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \sum_{h \in H} \exp\bigg(\sum_{i=1}^m b_i^\*h_i + \sum_{i=1}^m h_i \sum_{j=1}^n w_{ij} s_j\bigg)
- = \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \sum_{h \in H} \prod_{i=1}^m \exp\bigg(b_i^\*h_i + h_i \sum_{j=1}^n w_{ij} s_j\bigg)
- = \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \prod_{i=1}^m \sum_{h_i=0}^1 \exp\bigg(b_i^\*h_i + h_i \sum_{j=1}^n w_{ij} s_j\bigg)
- = \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \prod_{i=1}^m \bigg[ 1 + \exp\bigg(b_i^\* + \sum_{j=1}^n w_{ij} s_j\bigg)\bigg] \in \mathbb{C}
- $$
+ $$S \ni s \mapsto \psi(s,\alpha) = \sum_{h \in H} \Psi(s,h,\alpha) = \sum_{h \in H} \exp(a^\dagger s + b^\dagger h + h^\dagger ws) = \exp(a^\dagger s) \sum_{h \in H} \exp(b^\dagger h + h^\dagger ws) = \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \sum_{h \in H} \exp\bigg(\sum_{i=1}^m b_i^\*h_i + \sum_{i=1}^m h_i \sum_{j=1}^n w_{ij} s_j\bigg) = \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \sum_{h \in H} \prod_{i=1}^m \exp\bigg(b_i^\*h_i + h_i \sum_{j=1}^n w_{ij} s_j\bigg) = \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \prod_{i=1}^m \sum_{h_i=0}^1 \exp\bigg(b_i^\*h_i + h_i \sum_{j=1}^n w_{ij} s_j\bigg) = \exp\bigg(\sum_{j=1}^n a_j^\* s_j\bigg) \prod_{i=1}^m \bigg[ 1 + \exp\bigg(b_i^\* + \sum_{j=1}^n w_{ij} s_j\bigg)\bigg] \in \mathbb{C}$$
 where we ignore the normalization factor of the wave-function, and where $\dagger$ represents the matrix conjugate transpose. By the Born rule, the real, normalized probability distribution $p(\alpha):S \to [0,1]$ associated to the wave-function $\psi$ is defined by $S \ni s \mapsto p(s,\alpha) = |\psi(s,\alpha)|^2/\sum_{s' \in S} |\psi(s',\alpha)|^2 \in [0,1]$.
 
 For the RBM's cost function, we use the statistical expectation $E[\psi(\alpha)] = \langle H \rangle_{\psi(\alpha)}$ of the Hamiltonian $H$ in the variational trial state $\ket{\psi(\alpha)}$, given by
  $$E[\psi(\alpha)] = \frac{\langle \psi(\alpha), H\psi(\alpha) \rangle}{\langle \psi(\alpha), \psi(\alpha) \rangle} =
  \frac{\sum_{s,s' \in S} \psi^\*(s,\alpha) H_{ss'} \psi(s',\alpha)}{\sum_{s' \in S} |\psi(s',\alpha)|^2} =
- \frac{\sum_{s \in S} |\psi(s,\alpha)|^2 \left(\sum_{s' \in S} H_{ss'} \frac{\psi(s',\alpha)}{\psi(s,\alpha)}\right)}{\sum_{s' \in S} |\psi(s',\alpha)|^2} =
- \sum_{s \in S} p(s,\alpha) E_{\text{loc}}(s,\alpha)$$
+ \frac{\sum_{s \in S} |\psi(s,\alpha)|^2 \left(\sum_{s' \in S} H_{ss'} \frac{\psi(s',\alpha)}{\psi(s,\alpha)}\right)}{\sum_{s' \in S} |\psi(s',\alpha)|^2} = \sum_{s \in S} p(s,\alpha) E_{\text{loc}}(s,\alpha)$$
 where we define the variational local energies $E_{\text{loc}}(s,\alpha) = \sum_{s' \in S} H_{ss'} \frac{\psi(s',\alpha)}{\psi(s,\alpha)}$, with $H_{ss'}$ being the matrix element of $H$ in between the states $\ket{s}$ and $\ket{s'}$. Thus $E[\psi(\alpha)] = \sum_{s \in S} p(s,\alpha) E_{\text{loc}}(s,\alpha)$ is the statistical expectation of the local energies weighted by the probability distribution $p(\alpha):S \to [0,1]$.
 
 ## Transverse Field Ising Model
@@ -79,15 +69,8 @@ END SUBROUTINE metropolis_hastings
 
 In practice, we allow for a thermalization period, or "burn-in" period, during which the sampling process moves the initial random sample into the stationary distribution before we can begin recording samples. As we can see, the acceptance probabilities in the Metropolis-Hastings algorithm and the form of the local energy involve only ratios of the wave-functions $\psi(s,\alpha)$ for different configurations, and therefore we are justified in ignoring the normalization factor in our derivation of $\psi$. Once all samples are drawn, we may estimate the cost function as an average of the local energies over the drawn samples.
 
-The stochastic optimization algorithm involves modifying the parameters in the direction of the negative gradient of the energy functional in each learning iteration, a form of stochastic gradient descent in the direction of the generalized forces $F_\alpha = -\textrm{grad}_\alpha E[\psi(\alpha)]$ with components
- $$
- F_{\alpha_l} = - \frac{\partial}{\partial \alpha_l} E[\psi(\alpha)] \approx - \frac{1}{N} \sum_{s \in \tilde{S}} \frac{\partial}{\partial \alpha_l} E_{\text{loc}}(s, \alpha) = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\partial}{\partial \alpha_l} \frac{\psi(s', \alpha)}{\psi(s, \alpha)}
- = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \bigg[ \frac{1}{\psi(s, \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s', \alpha) - \frac{\psi(s', \alpha)}{\psi(s, \alpha)^2} \frac{\partial}{\partial \alpha_l} \psi(s, \alpha) \bigg]
- = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} \bigg[ \frac{1}{\psi(s', \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s', \alpha) - \frac{1}{\psi(s, \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s, \alpha) \bigg]
- = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} \bigg[ \frac{\partial}{\partial \alpha_l} \ln \psi(s', \alpha) - \frac{\partial}{\partial \alpha_l} \ln \psi(s, \alpha) \bigg]
- = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} \bigg[ O_l(s',\alpha) - O_l(s,\alpha) \bigg]
- = \frac{1}{N} \sum_{s \in \tilde{S}} \bigg[ O_l(s,\alpha) E_{\text{loc}}(s, \alpha) - \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} O_l(s',\alpha) \bigg]
- $$
+The stochastic optimization algorithm involves modifying the parameters in the direction of the negative gradient of the energy functional in each learning iteration, a form of stochastic gradient descent in the direction of the generalized forces $F_\alpha = -\textrm{grad}\_\alpha E[\psi(\alpha)]$ with components
+ $$F_{\alpha_l} = - \frac{\partial}{\partial \alpha_l} E[\psi(\alpha)] \approx - \frac{1}{N} \sum_{s \in \tilde{S}} \frac{\partial}{\partial \alpha_l} E_{\text{loc}}(s, \alpha) = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\partial}{\partial \alpha_l} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \bigg[ \frac{1}{\psi(s, \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s', \alpha) - \frac{\psi(s', \alpha)}{\psi(s, \alpha)^2} \frac{\partial}{\partial \alpha_l} \psi(s, \alpha) \bigg] = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} \bigg[ \frac{1}{\psi(s', \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s', \alpha) - \frac{1}{\psi(s, \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s, \alpha) \bigg] = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} \bigg[ \frac{\partial}{\partial \alpha_l} \ln \psi(s', \alpha) - \frac{\partial}{\partial \alpha_l} \ln \psi(s, \alpha) \bigg] = - \frac{1}{N} \sum_{s \in \tilde{S}} \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} \bigg[ O_l(s',\alpha) - O_l(s,\alpha) \bigg] = \frac{1}{N} \sum_{s \in \tilde{S}} \bigg[ O_l(s,\alpha) E_{\text{loc}}(s, \alpha) - \sum_{s' \in S} H_{ss'} \frac{\psi(s', \alpha)}{\psi(s, \alpha)} O_l(s',\alpha) \bigg]$$
 where we define the logarithmic derivatives
  $$O_l(s,\alpha) = \frac{\partial}{\partial \alpha_l} \ln \psi(s, \alpha) = \frac{1}{\psi(s, \alpha)} \frac{\partial}{\partial \alpha_l} \psi(s, \alpha)$$
 of the variational wave-functions $\psi(s, \alpha)$ in terms of diagonal operators $O_l$ defined by $O_l \psi(s, \alpha) = O_l(s,\alpha)$. In the final equality of $F_{\alpha_l}$, the second term is a modified local energy where each term of the summation is weighted by the logarithmic derivatives $O_l(s',\alpha)$ for each $s' \in S$. By making a further approximation
@@ -102,7 +85,7 @@ for some learning rate $\eta > 0$.
 
 ## Stochastic Reconfiguration
 
-To overcome typical problems in the vanilla stochastic optimization, we seek to improve performance of the algorithm by pre-conditioning the gradient $F_\alpha$ with a Hermitian positive-definite matrix $S^{-1}(\alpha)$ prior to updating the parameters $\alpha \in \\{a,b,w\\}$, such that the update rule becomes
+To overcome typical problems in the vanilla stochastic optimization, we seek to improve performance of the algorithm by pre-conditioning the gradient $F_\alpha$ with a Hermitian positive-definite matrix $S^{-1}(\alpha)$ prior to updating the parameters $\alpha \in \mathcal{M}$, such that the update rule becomes
  $$\alpha ← \alpha + \eta S^{-1}(\alpha) F_\alpha$$
 where the matrix $S(\alpha)$ is known as the stochastic reconfiguration matrix. Choosing $S(\alpha)$ as the identity recovers the ordinary stochastic optimization. A sophisticated choice for $S(\alpha)$ is the Quantum Geometric Tensor whose components are defined as the expectation covariances
  $$S_{kl}(\alpha) = \langle O_k^\dagger O_l \rangle_{\psi(\alpha)} - \langle O_k^\dagger \rangle_{\psi(\alpha)} \langle O_l \rangle_{\psi(\alpha)} = \frac{\langle O_k \psi(\alpha), O_l \psi(\alpha) \rangle}{\langle \psi(\alpha), \psi(\alpha) \rangle} - \frac{\langle O_k \psi(\alpha), \psi(\alpha) \rangle}{\langle \psi(\alpha), \psi(\alpha) \rangle} \frac{\langle \psi(\alpha), O_l \psi(\alpha) \rangle}{\langle \psi(\alpha), \psi(\alpha) \rangle} = \frac{\sum_{s \in S} O_k^\*(s,\alpha) O_l(s,\alpha)}{\sum_{s' \in S} |\psi(s',\alpha)|^2} - \bigg[ \frac{\sum_{s \in S} O_k^\*(s,\alpha) \psi(s,\alpha)}{\sum_{s' \in S} |\psi(s',\alpha)|^2} \bigg] \bigg[ \frac{\sum_{s \in S} \psi^\*(s,\alpha) O_l(s,\alpha)}{\sum_{s' \in S} |\psi(s',\alpha)|^2} \bigg] \approx \frac{1}{N} \sum_{s \in \tilde{S}} O_k^\*(s,\alpha) O_l(s,\alpha) - \bigg[ \frac{1}{N} \sum_{s \in \tilde{S}} O_k^\*(s,\alpha) \bigg] \bigg[ \frac{1}{N} \sum_{s \in \tilde{S}} O_l(s,\alpha) \bigg]$$