5. July 2023

… The Problem

Local versus Global Updates

QCD partition function \[ \mathcal{Z}\ =\ \int\mathcal{D}\bar\psi\,\mathcal{D}\psi\,\mathcal{D}U\, e^{-S[\bar\psi,\psi,U]} \]

  • Euclidean lattice action \(S\)

  • for definitness we assume \(N_f=2\) mass degenerate Wilson type quark flavours
    • can all be generalised for different \(N_f\)-values
  • fermion fields Grassmann valued, difficult to represent on a computer

\(\Rightarrow\quad\) integrate out the fermion fields

Local versus Global Updates

\(\Rightarrow\quad\) integrate out the fermion fields

\[ \mathcal{Z}\ =\ \int\,\mathcal{D}U\,[\det(D)]^2\, e^{-S_g[U]} \]

and represent the determinant stochastically

\[ \mathcal{Z}\ =\ \int\mathcal{D}\phi^\dagger\,\mathcal{D}\phi\,\mathcal{D}U\, e^{-\phi^\dagger\frac{1}{D[U]^2}\phi - S_g[U]} \]

  • with pure gauge action \(S_g\)

  • \(D\) the Lattice Dirac operator

  • \(\phi^\dagger,\phi\) so-called pseudo-fermion (bosonic) fields

Local versus Global Updates

Traded Grassmann-valued fields with a highly non-local term \[ S_\mathrm{eff} = S_\mathrm{PF} + S_g\,,\qquad S_\mathrm{PF}\ =\ \phi^\dagger\,D[U]^{-2}\,\phi \] due to the matrix inverse!

\(\Rightarrow\quad\) local Metropolis updates unrealistic

  • any local change requires the evaluation of \(S_\mathrm{PF}\)

random global updates?

  • acceptance rate negligible for non-infinitesimal updates

The HMC Algorithm

Recap the Metropolis MC Algorithm

Recall the steps for the local Metropolis algorithm

  • probability density \(g(x), x\in\mathbb{R}^n\) (later \(g\equiv e^{-S}\))

Starting from initial \(x\)

  1. generate proposal \(x_i'\) for \(x_i\) with random \(i\)
  2. accept/reject \(x'\) with probability \[ P_\mathrm{acc}\ =\ \min\left\{1,\ \frac{g(x')}{g(x)}\right\} \]
  3. restart from 1. for another random \(i\)-value

Recap the Metropolis MC Algorithm (cont.)

  • proposal needs to be ergodic

  • Metropolis algorithm fulfils detailed balance \[ g(x)\ P_\mathrm{acc}(x\to x') = g(x')\ P_\mathrm{acc}(x'\to x) \]

\(\Rightarrow\quad\) algorithm generates a Markov chain


  • this works well for pure gauge theory
    action local

  • again, with fermions this is no longer the case

Hybrid Monte Carlo (HMC)

Goal: smartly replace the local update by a global one!

Introduce an artificial Hamiltonian \[ \mathcal{H}(p, x)\ =\ \frac{1}{2}\sum p^2 - \ln(g(x)) \] with canonical momenta \(p\in\mathbb{R}^n\) and a one-paramter familty of deterministic maps \[ \varphi^\tau(p, x): (p, x) \to (p', x') \] with \(\tau\in\mathbb{R}\) which is reversible, i.e. for all \(p, x\) \[ \varphi^\tau(p, x)=(p', x')\quad \Rightarrow\quad \varphi^\tau(-p',x') = (-p, x) \]

Hybrid Monte Carlo (HMC)

Then the HMC algorithm is defined by the following steps

Starting from initial \(x\in\mathbb{R}^n\)

  1. heatbath: \(p\ \sim\ \mathcal{N}_{0,1}\)

  2. set \((p', x') = \varphi^\tau(p, x)\) with \(\tau\neq 0\)

  3. accept/reject \((p', x')\) with probability \[ P_\mathrm{acc}\ =\ \min\{1, \exp(-\Delta\mathcal{H})\}\,,\quad \Delta\mathcal{H} = \mathcal{H}(p,x) - \mathcal{H}(p', x') \]

  4. restart at 1.

Proof for Detailed Balance

Don’t need to know the normalisation: let \(g(x)\propto f(x)\)

The HMC transition probability \[ \begin{split} \mathbb{P}_\mathrm{HMC}(x, x') = \int d p\ &\mathcal{N}_{0,1}(p)\, \delta(x'-\varphi^\tau_x(p, x))\\ &\mathbb{P}_\mathrm{acc}[p, x \to \varphi^\tau(p,x)]\,.\\ \end{split} \] We can introduce an integral over \(p'\) as well, which gives \[ \begin{split} \mathbb{P}_\mathrm{HMC}(x, x') = \int d p\,d p'\ &\mathcal{N}_{0,1}(p)\, \delta((p',x')-\varphi^\tau(p, x))\\ &\mathbb{P}_\mathrm{acc}[p, x \to p',x']\,.\\ \end{split} \]

Proof (cont.)

Due to the Metropolis accept/reject step, we have detailed balance with respect to \(\exp(-\mathcal{H})\), and thus \[ \begin{split} e^{-\mathcal{H}(p, x)}\,&\mathbb{P}_\mathrm{acc}[p, x \to p',x']\\ & = e^{-\mathcal{H}(p', x')}\,\mathbb{P}_\mathrm{acc}[p', x' \to p,x]\,.\\ \end{split} \] Moreover, \[ f(x)\, \mathcal{N}_{0,1}(p) = C e^{-\mathcal{H}(p, x)} \] with a normalisation factor \(C\).

Proof (cont.)

Using these properties and the reversibility of \(\varphi^\tau\) it follows \[ \begin{split} f(x)\, &\mathbb{P}_\mathrm{HMC}(x, x') \\ = C\int &d p\,d p'\, e^{-\mathcal{H}(-p', x')}\, \delta((-p,x)-\varphi^\tau(-p', x'))\\ &\quad\mathbb{P}_\mathrm{acc}[-p', x' \to -p,x]\,,\\ \end{split} \] where we have also used that \(\mathcal{H}\) as well as the acceptance probability are agnostic to the sign of \(p\).
Since \(d pd p' = d(-p)d(-p')\), detailed balance follows.

Hybrid Monte Carlo (HMC)

… so, what about this \(\varphi^\tau\)??

  • needs to be reversible
  • needs to be ergodic

\(\Rightarrow\) HMC generates a Markov Chain for distribution \(g(x)\)!


The only option I know is the map \(\varphi^\tau\) induced by Hamilton’s EOM \[ \begin{split} \dot p_i \equiv \frac{d p_i}{d\tau} &= -\frac{\partial\mathcal{H}}{\partial x_i}\,,\qquad \dot x_i \equiv \frac{d x_i}{d\tau} &= \frac{\partial\mathcal{H}}{\partial p_i}\,.\\ \end{split} \]

  • parameter \(\tau\) becomes trajectory length
  • also called Molecular Dynamics (MD) update

Hybrid Monte Carlo (HMC)

  • if Hamilton’s EOM solved exactly, \(\mathcal{H}\) is conserved
    \[ \Rightarrow\quad\Delta\mathcal{H}=0 \] \(\Rightarrow\quad\) 100% acceptance!


  • in practice: need to solve EOMs numerically

  • use a symplectic, reversible integration scheme
    like for instance the Leap-Frog integration scheme

  • those conserve a shadow Hamiltonian \(\mathcal{H}_s\neq\mathcal{H}\)
    as correction in powers of the integration step size \(\Delta\tau\)

Leap-Frog versus Runge-Kutta

Example \(\chi^2\) Fit

Perform a \(\chi^2\) fit using the HMC \[ \mathcal{H} = \frac{1}{2}(p_a^2 + p_b^2) + \chi^2\,,\qquad \chi^2 = \frac{1}{2}\frac{(y - at - b)^2}{\Delta y^2} \]

Derivatives \[ \frac{d\mathcal{H}}{da} = -t\frac{y-at-b}{\Delta y^2}\,,\qquad\frac{d\mathcal{H}}{db}=-\frac{y-at-b}{\Delta y^2} \]

Example \(\chi^2\) Fit

Leap-Frog \[ \begin{split} p_{a,b}^{i+1/2} &= p_{a,b}^i - \frac{d\mathcal{H}}{d\{a,b\}}\,\Delta\tau/2\\ \{a,b\}^{i+1} &= \{a,b\}^{i} + p_{a,b}^{i+1/2}\Delta\tau\\ p_{a,b}^{i+1} &= p_{a,b}^{i+1/2} - \frac{d\mathcal{H}}{d\{a,b\}}\,\Delta\tau/2\\ \end{split} \]

Example \(\chi^2\) Fit

HMC for Lattice QCD

  • pseudo-fermion fields \(\phi^\dagger,\phi\) fixed for each trajectory

  • dynamical variables: \(U_\mu(x) = \exp(-i\,\alpha_a(\mu, x) t_a)\)
  • \(\mathrm{SU}(N)\) generators \(t_a\)

  • momenta \(p_a(\mu, x)\) conjugate to the \(\alpha_a(\mu, x)\)

  • derivative in direction \(a\) given by (left) Lie derivative \[ D_a f(U) = -i\frac{d}{d\epsilon} f(e^{i\epsilon t_a} U)|_{\epsilon = 0} \]

  • large acceptance guaranteed by symplectic integration scheme!

The Berlin Wall Plot

  • cost of \(N_f=2\) dynamical lattice QCD simulations: \[ \mathrm{Cost}\propto \left(\frac{M_\pi}{M_\rho}\right)^{-6}\cdot L^5\cdot \left(\frac{1}{a}\right)^7 \]
    Ukawa, CP-PACS, Lattice 2001, Berlin, Nucl.Phys.B Proc.Suppl. 106 (2002)


\(\Rightarrow\quad\) in 2001: simulation at the physical point totally unrealistic!

Since then there were two key improvements!

The Key Improvements

The Key Improvements


  1. preconditioning with multiple time-scale integration

    Hasenbusch, 2001; Lüscher, 2005; CU et al., 2005; Clark and Kennedy, 2006, …


  2. multi-grid iterative solvers with low-mode deflation

    Lüscher, 2007; Babich et al., 2010; Fommer et al., 2011, …

Hasenbusch Mass Preconditioning

  • Hasenbusch’s idea:
    Hasenbusch, 2001

\[ \det(Q^2)\ =\ \frac{\det(Q^2)}{\det(Q^2+\rho_1^2)}\cdot \frac{\det(Q^2+\rho_1^2)}{\det(Q^2+\rho_2^2)}\cdot\ldots\cdot\det(Q^2+\rho_n^2) \]

  • leading to effective pseudo-fermion action \[ S_\mathrm{eff,PF} = \phi_n^\dagger \frac{1}{Q^2 + \rho_n^2}\phi_n + \ldots + \phi_1^\dagger\frac{Q^2+\rho_1^2}{Q^2}\phi_1 \]

Hasenbusch Mass Preconditioning

\(\Rightarrow\) four effects of this preconditioning

  1. MD forces smaller in each term
  2. \(Q^2+\rho_n^2\) (much) cheaper to invert than \(Q^2\)
  3. better averaging due to multiple \(\phi_i\): integration more stable
  4. allows for a separation of scales

in particular: tune \(\rho_i\) parameters such that

  • the (cheap) \(\rho_n\) term can be integrated with small \(\Delta\tau_n\)
  • the (expensive) \(\rho_1\) term can be integrated with large \(\Delta\tau_1 >> \Delta\tau_n\)

Note: there are other such preconditioners available!

Multiple Time Scale Integration

  • Assume: \(\mathcal{H}=\frac{1}{2}\sum p^2 + S_0(x) + S_1(x) + \ldots + S_n(x)\)

  • Define (\(j=0,1, \ldots n\)): \[ \begin{split} T_\mathrm{x}(\Delta\tau)&:\quad x\quad\to\quad x' = x + \Delta\tau p \\ T_{\mathrm{S}_j}(\Delta\tau)&:\quad p\quad\to\quad p' = p - \Delta\tau\frac{\partial S_j}{\partial x}\\ \end{split} \]

  • then recursively: \[ \begin{split} \Bigl.\Bigr.T_0 &= T_{\mathrm{S}_0}(\Delta\tau_0/2)\ T_x(\Delta\tau_0)\ T_{\mathrm{S}_0}(\Delta\tau_0/2)\, ,\\ \Bigl.\Bigr.T_j &= T_{\mathrm{S}_j}(\Delta\tau_j/2)\ [T_{j-1}]^{N_{j-1}}\ T_{\mathrm{S}_j}(\Delta\tau_j/2)\\ \end{split} \]

  • trajectory of length \(\tau\): \(\bigl[ T_n\bigr]^{N_n}\)

Berlin Wall Plot again

  • Hasenbusch plus multiple time scales

  • example for \(N_f=2\) Wilson fermions
    Urbach et al., 2005
  • many other examples available
    see e.g. Clark, 2006
  • simulations at the physical point become realistic!

  • quark mass dependence still challenging!

Taming the Quark Mass Dependence

  • any further improvement would require reducing the effort for \(D^{-2}\)

  • need to find a suitable preconditioner
    • fast to construct
    • efficient approximation of the low-lying spectrum
  • one could compute spectrum of \(D\) up to some threshold
    \(\Rightarrow\) too many modes to be feasible (and faster)

Local Coherence

  • devide the lattice into \(N_b\) blocks

  • denote them \(\Lambda(\vec b)\) with block coordinates \(\vec b\)

  • 2d example in the right panel

  • generate \(N=\mathcal{O}(10)\) approximate lowest eigenvectors \(\psi_l\) of \(D\) (global)

  • restrict to blocks \[ \phi_l^{\vec b}(x) = \begin{cases} \psi_l(x) & \textrm{if}\ x \in \Lambda(\vec b)\, ,\\ 0 & \textrm{otherwise}\,. \end{cases} \]

Local Coherence

  • restrict to blocks \[ \phi_l^{\vec b}(x) = \begin{cases} \psi_l(x) & \textrm{if}\ x \in \Lambda(\vec b)\, ,\\ 0 & \textrm{otherwise}\,. \end{cases} \] and orthonormlise

  • this generates \(N_b\cdot N\) vectors

  • key observation:
    the \(\psi_l^{\vec b}\) span the space of the low modes up to small deficits
    Lüscher, 2006

Local Coherence

key points of this so far

  • a relatively fast to generate set of basis vectors \(\psi_l^{\vec b}\)

  • local coherence: they can approximate the space of low modes

\(\Rightarrow\) this can be used as a preconditioner


  • project \(D\) to the subspace \[ A_{(\vec a,k)(\vec b, l)} = \langle\phi_k^{\vec a}| D\phi_l^{\vec b}\rangle \] which is relatively sparse and much smaller than \(D\)

A Multigrid Approach

roughly speaking:

  • deflate \(D\): little Dirac operator \(A\)

  • solve \(A\) on the sub-space

  • use this as an approximate preconditioner for a Krylov subspace method on the full space

  • \(A\) can be deflated with the global modes


\(\Rightarrow\) a cycle of restriction and prolongation \(\Rightarrow\) Multgrid

Deflated Solver

result:

  • the quark mass dependence is almost flat!
    Lüscher, 2006

concept has been further improved

  • Adaptive Multigrid
    Babich et al, 2010
  • DD\(\alpha\)AMG solver
    Frommer et al., 2014; Alexandrou et al, 2016
  • Multigrid in QUDA
    Clark et al.

Berlin Wall Plot (a last time)

  • the multigrid solver roughly changes \[ m_q^{-3}\quad\to\quad m_q^{-1} \] in the cost formula

  • example plot for \(a=0.08\ \mathrm{fm}\), \(L=1.92\ \mathrm{fm}\) and 1000 trajectories

  • only illustrative
    in practice the details will be different

Remaining Challenges

Critical Slowing Down

  • for \(a \sim 0.05\ \mathrm{fm}\): \(\tau_\mathrm{int}\) of topological charge \(Q\) rises drastically
    Schäfer et al., ALPHA, 2011

    \[ \mathrm{Cost}\propto a^{-10}\ ? \]

  • some quantities not affected by large \(\tau_\mathrm{int}\)-values

  • however, how to interpret results if \(Q\) not properly sampled?

  • open boundary conditions for gauge fields as a cure
    Lüscher, Schäfer, 2011

Outlook

  • machine learning might reduce the cost significantly
    work by Shanahan and collaborators

\(\Rightarrow\) normalising flows:

  • learn a diffeomorphism between e.g. a Gaussian distribution and \(e^{-S}\)

  • learning takes time, but sampling is then basically for free

  • problematic volume scaling

  • does the critical slowing down bite back during learning?
    Nicoli et al., 2023

Thank you!