Lattice QCD Simulations

5. July 2023

… The Problem

Local versus Global Updates

QCD partition function \[ \mathcal{Z}\ =\ \int\mathcal{D}\bar\psi\,\mathcal{D}\psi\,\mathcal{D}U\, e^{-S[\bar\psi,\psi,U]} \]

Euclidean lattice action \(S\)
for definitness we assume \(N_f=2\) mass degenerate Wilson type quark flavours
- can all be generalised for different \(N_f\)-values
fermion fields Grassmann valued, difficult to represent on a computer

\(\Rightarrow\quad\) integrate out the fermion fields

Local versus Global Updates

\(\Rightarrow\quad\) integrate out the fermion fields

\[ \mathcal{Z}\ =\ \int\,\mathcal{D}U\,[\det(D)]^2\, e^{-S_g[U]} \]

and represent the determinant stochastically

\[ \mathcal{Z}\ =\ \int\mathcal{D}\phi^\dagger\,\mathcal{D}\phi\,\mathcal{D}U\, e^{-\phi^\dagger\frac{1}{D[U]^2}\phi - S_g[U]} \]

with pure gauge action \(S_g\)
\(D\) the Lattice Dirac operator
\(\phi^\dagger,\phi\) so-called pseudo-fermion (bosonic) fields

Local versus Global Updates

Traded Grassmann-valued fields with a highly non-local term \[ S_\mathrm{eff} = S_\mathrm{PF} + S_g\,,\qquad S_\mathrm{PF}\ =\ \phi^\dagger\,D[U]^{-2}\,\phi \] due to the matrix inverse!

\(\Rightarrow\quad\) local Metropolis updates unrealistic

any local change requires the evaluation of \(S_\mathrm{PF}\)

random global updates?

acceptance rate negligible for non-infinitesimal updates

The HMC Algorithm

Recap the Metropolis MC Algorithm

Recall the steps for the local Metropolis algorithm

probability density \(g(x), x\in\mathbb{R}^n\) (later \(g\equiv e^{-S}\))

Starting from initial \(x\)

generate proposal \(x_i'\) for \(x_i\) with random \(i\)
accept/reject \(x'\) with probability \[ P_\mathrm{acc}\ =\ \min\left\{1,\ \frac{g(x')}{g(x)}\right\} \]
restart from 1. for another random \(i\)-value

Recap the Metropolis MC Algorithm (cont.)

proposal needs to be ergodic
Metropolis algorithm fulfils detailed balance \[ g(x)\ P_\mathrm{acc}(x\to x') = g(x')\ P_\mathrm{acc}(x'\to x) \]

\(\Rightarrow\quad\) algorithm generates a Markov chain

this works well for pure gauge theory
action local
again, with fermions this is no longer the case

Hybrid Monte Carlo (HMC)

Goal: smartly replace the local update by a global one!

Introduce an artificial Hamiltonian \[ \mathcal{H}(p, x)\ =\ \frac{1}{2}\sum p^2 - \ln(g(x)) \] with canonical momenta \(p\in\mathbb{R}^n\) and a one-paramter familty of deterministic maps \[ \varphi^\tau(p, x): (p, x) \to (p', x') \] with \(\tau\in\mathbb{R}\) which is reversible, i.e. for all \(p, x\) \[ \varphi^\tau(p, x)=(p', x')\quad \Rightarrow\quad \varphi^\tau(-p',x') = (-p, x) \]

Hybrid Monte Carlo (HMC)

Then the HMC algorithm is defined by the following steps

Starting from initial \(x\in\mathbb{R}^n\)

heatbath: \(p\ \sim\ \mathcal{N}_{0,1}\)
set \((p', x') = \varphi^\tau(p, x)\) with \(\tau\neq 0\)
accept/reject \((p', x')\) with probability \[ P_\mathrm{acc}\ =\ \min\{1, \exp(-\Delta\mathcal{H})\}\,,\quad \Delta\mathcal{H} = \mathcal{H}(p,x) - \mathcal{H}(p', x') \]
restart at 1.

Proof for Detailed Balance

Don’t need to know the normalisation: let \(g(x)\propto f(x)\)

The HMC transition probability \[ \begin{split} \mathbb{P}_\mathrm{HMC}(x, x') = \int d p\ &\mathcal{N}_{0,1}(p)\, \delta(x'-\varphi^\tau_x(p, x))\\ &\mathbb{P}_\mathrm{acc}[p, x \to \varphi^\tau(p,x)]\,.\\ \end{split} \] We can introduce an integral over \(p'\) as well, which gives \[ \begin{split} \mathbb{P}_\mathrm{HMC}(x, x') = \int d p\,d p'\ &\mathcal{N}_{0,1}(p)\, \delta((p',x')-\varphi^\tau(p, x))\\ &\mathbb{P}_\mathrm{acc}[p, x \to p',x']\,.\\ \end{split} \]

Proof (cont.)

Due to the Metropolis accept/reject step, we have detailed balance with respect to \(\exp(-\mathcal{H})\), and thus \[ \begin{split} e^{-\mathcal{H}(p, x)}\,&\mathbb{P}_\mathrm{acc}[p, x \to p',x']\\ & = e^{-\mathcal{H}(p', x')}\,\mathbb{P}_\mathrm{acc}[p', x' \to p,x]\,.\\ \end{split} \] Moreover, \[ f(x)\, \mathcal{N}_{0,1}(p) = C e^{-\mathcal{H}(p, x)} \] with a normalisation factor \(C\).

Proof (cont.)

Using these properties and the reversibility of \(\varphi^\tau\) it follows \[ \begin{split} f(x)\, &\mathbb{P}_\mathrm{HMC}(x, x') \\ = C\int &d p\,d p'\, e^{-\mathcal{H}(-p', x')}\, \delta((-p,x)-\varphi^\tau(-p', x'))\\ &\quad\mathbb{P}_\mathrm{acc}[-p', x' \to -p,x]\,,\\ \end{split} \] where we have also used that \(\mathcal{H}\) as well as the acceptance probability are agnostic to the sign of \(p\).
Since \(d pd p' = d(-p)d(-p')\), detailed balance follows.

Hybrid Monte Carlo (HMC)

… so, what about this \(\varphi^\tau\)??

needs to be reversible
needs to be ergodic

\(\Rightarrow\) HMC generates a Markov Chain for distribution \(g(x)\)!

The only option I know is the map \(\varphi^\tau\) induced by Hamilton’s EOM \[ \begin{split} \dot p_i \equiv \frac{d p_i}{d\tau} &= -\frac{\partial\mathcal{H}}{\partial x_i}\,,\qquad \dot x_i \equiv \frac{d x_i}{d\tau} &= \frac{\partial\mathcal{H}}{\partial p_i}\,.\\ \end{split} \]

parameter \(\tau\) becomes trajectory length
also called Molecular Dynamics (MD) update

Hybrid Monte Carlo (HMC)

if Hamilton’s EOM solved exactly, \(\mathcal{H}\) is conserved
\[ \Rightarrow\quad\Delta\mathcal{H}=0 \] \(\Rightarrow\quad\) 100% acceptance!

in practice: need to solve EOMs numerically
use a symplectic, reversible integration scheme
like for instance the Leap-Frog integration scheme
those conserve a shadow Hamiltonian \(\mathcal{H}_s\neq\mathcal{H}\)
as correction in powers of the integration step size \(\Delta\tau\)

Leap-Frog versus Runge-Kutta

Example \(\chi^2\) Fit

Perform a \(\chi^2\) fit using the HMC \[ \mathcal{H} = \frac{1}{2}(p_a^2 + p_b^2) + \chi^2\,,\qquad \chi^2 = \frac{1}{2}\frac{(y - at - b)^2}{\Delta y^2} \]

Derivatives \[ \frac{d\mathcal{H}}{da} = -t\frac{y-at-b}{\Delta y^2}\,,\qquad\frac{d\mathcal{H}}{db}=-\frac{y-at-b}{\Delta y^2} \]

Example \(\chi^2\) Fit

Leap-Frog \[ \begin{split} p_{a,b}^{i+1/2} &= p_{a,b}^i - \frac{d\mathcal{H}}{d\{a,b\}}\,\Delta\tau/2\\ \{a,b\}^{i+1} &= \{a,b\}^{i} + p_{a,b}^{i+1/2}\Delta\tau\\ p_{a,b}^{i+1} &= p_{a,b}^{i+1/2} - \frac{d\mathcal{H}}{d\{a,b\}}\,\Delta\tau/2\\ \end{split} \]

Example \(\chi^2\) Fit

HMC for Lattice QCD

pseudo-fermion fields \(\phi^\dagger,\phi\) fixed for each trajectory
dynamical variables: \(U_\mu(x) = \exp(-i\,\alpha_a(\mu, x) t_a)\)
\(\mathrm{SU}(N)\) generators \(t_a\)
momenta \(p_a(\mu, x)\) conjugate to the \(\alpha_a(\mu, x)\)
derivative in direction \(a\) given by (left) Lie derivative \[ D_a f(U) = -i\frac{d}{d\epsilon} f(e^{i\epsilon t_a} U)|_{\epsilon = 0} \]
large acceptance guaranteed by symplectic integration scheme!

The Berlin Wall Plot

cost of \(N_f=2\) dynamical lattice QCD simulations: \[ \mathrm{Cost}\propto \left(\frac{M_\pi}{M_\rho}\right)^{-6}\cdot L^5\cdot \left(\frac{1}{a}\right)^7 \]
Ukawa, CP-PACS, Lattice 2001, Berlin, Nucl.Phys.B Proc.Suppl. 106 (2002)

\(\Rightarrow\quad\) in 2001: simulation at the physical point totally unrealistic!

Since then there were two key improvements!

The Key Improvements

preconditioning with multiple time-scale integration

Hasenbusch, 2001; Lüscher, 2005; CU et al., 2005; Clark and Kennedy, 2006, …
multi-grid iterative solvers with low-mode deflation

Lüscher, 2007; Babich et al., 2010; Fommer et al., 2011, …

Hasenbusch Mass Preconditioning

Hasenbusch’s idea:
Hasenbusch, 2001

\[ \det(Q^2)\ =\ \frac{\det(Q^2)}{\det(Q^2+\rho_1^2)}\cdot \frac{\det(Q^2+\rho_1^2)}{\det(Q^2+\rho_2^2)}\cdot\ldots\cdot\det(Q^2+\rho_n^2) \]

leading to effective pseudo-fermion action \[ S_\mathrm{eff,PF} = \phi_n^\dagger \frac{1}{Q^2 + \rho_n^2}\phi_n + \ldots + \phi_1^\dagger\frac{Q^2+\rho_1^2}{Q^2}\phi_1 \]

Hasenbusch Mass Preconditioning

\(\Rightarrow\) four effects of this preconditioning

MD forces smaller in each term
\(Q^2+\rho_n^2\) (much) cheaper to invert than \(Q^2\)
better averaging due to multiple \(\phi_i\): integration more stable
allows for a separation of scales

in particular: tune \(\rho_i\) parameters such that

the (cheap) \(\rho_n\) term can be integrated with small \(\Delta\tau_n\)
the (expensive) \(\rho_1\) term can be integrated with large \(\Delta\tau_1 >> \Delta\tau_n\)

Note: there are other such preconditioners available!

Multiple Time Scale Integration

Assume: \(\mathcal{H}=\frac{1}{2}\sum p^2 + S_0(x) + S_1(x) + \ldots + S_n(x)\)
Define (\(j=0,1, \ldots n\)): \[ \begin{split} T_\mathrm{x}(\Delta\tau)&:\quad x\quad\to\quad x' = x + \Delta\tau p \\ T_{\mathrm{S}_j}(\Delta\tau)&:\quad p\quad\to\quad p' = p - \Delta\tau\frac{\partial S_j}{\partial x}\\ \end{split} \]
then recursively: \[ \begin{split} \Bigl.\Bigr.T_0 &= T_{\mathrm{S}_0}(\Delta\tau_0/2)\ T_x(\Delta\tau_0)\ T_{\mathrm{S}_0}(\Delta\tau_0/2)\, ,\\ \Bigl.\Bigr.T_j &= T_{\mathrm{S}_j}(\Delta\tau_j/2)\ [T_{j-1}]^{N_{j-1}}\ T_{\mathrm{S}_j}(\Delta\tau_j/2)\\ \end{split} \]
trajectory of length \(\tau\): \(\bigl[ T_n\bigr]^{N_n}\)

Berlin Wall Plot again

Hasenbusch plus multiple time scales
example for \(N_f=2\) Wilson fermions
Urbach et al., 2005
many other examples available
see e.g. Clark, 2006
simulations at the physical point become realistic!
quark mass dependence still challenging!

Taming the Quark Mass Dependence

any further improvement would require reducing the effort for \(D^{-2}\)
need to find a suitable preconditioner
- fast to construct
- efficient approximation of the low-lying spectrum
one could compute spectrum of \(D\) up to some threshold
\(\Rightarrow\) too many modes to be feasible (and faster)

Local Coherence

devide the lattice into \(N_b\) blocks
denote them \(\Lambda(\vec b)\) with block coordinates \(\vec b\)
2d example in the right panel
generate \(N=\mathcal{O}(10)\) approximate lowest eigenvectors \(\psi_l\) of \(D\) (global)
restrict to blocks \[ \phi_l^{\vec b}(x) = \begin{cases} \psi_l(x) & \textrm{if}\ x \in \Lambda(\vec b)\, ,\\ 0 & \textrm{otherwise}\,. \end{cases} \]

Local Coherence

restrict to blocks \[ \phi_l^{\vec b}(x) = \begin{cases} \psi_l(x) & \textrm{if}\ x \in \Lambda(\vec b)\, ,\\ 0 & \textrm{otherwise}\,. \end{cases} \] and orthonormlise
this generates \(N_b\cdot N\) vectors
key observation:
the \(\psi_l^{\vec b}\) span the space of the low modes up to small deficits

Lüscher, 2006

Local Coherence

key points of this so far

a relatively fast to generate set of basis vectors \(\psi_l^{\vec b}\)
local coherence: they can approximate the space of low modes

\(\Rightarrow\) this can be used as a preconditioner

project \(D\) to the subspace \[ A_{(\vec a,k)(\vec b, l)} = \langle\phi_k^{\vec a}| D\phi_l^{\vec b}\rangle \] which is relatively sparse and much smaller than \(D\)

A Multigrid Approach

roughly speaking:

deflate \(D\): little Dirac operator \(A\)
solve \(A\) on the sub-space
use this as an approximate preconditioner for a Krylov subspace method on the full space
\(A\) can be deflated with the global modes

\(\Rightarrow\) a cycle of restriction and prolongation \(\Rightarrow\) Multgrid

Deflated Solver

result:

the quark mass dependence is almost flat!

Lüscher, 2006

concept has been further improved

Adaptive Multigrid
Babich et al, 2010
DD\(\alpha\)AMG solver
Frommer et al., 2014; Alexandrou et al, 2016
Multigrid in QUDA
Clark et al.

Berlin Wall Plot (a last time)

the multigrid solver roughly changes \[ m_q^{-3}\quad\to\quad m_q^{-1} \] in the cost formula
example plot for \(a=0.08\ \mathrm{fm}\), \(L=1.92\ \mathrm{fm}\) and 1000 trajectories
only illustrative
in practice the details will be different

Remaining Challenges

Critical Slowing Down

for \(a \sim 0.05\ \mathrm{fm}\): \(\tau_\mathrm{int}\) of topological charge \(Q\) rises drastically
Schäfer et al., ALPHA, 2011

\[ \mathrm{Cost}\propto a^{-10}\ ? \]
some quantities not affected by large \(\tau_\mathrm{int}\)-values
however, how to interpret results if \(Q\) not properly sampled?
open boundary conditions for gauge fields as a cure
Lüscher, Schäfer, 2011

Outlook

machine learning might reduce the cost significantly

work by Shanahan and collaborators

\(\Rightarrow\) normalising flows:

learn a diffeomorphism between e.g. a Gaussian distribution and \(e^{-S}\)
learning takes time, but sampling is then basically for free
problematic volume scaling
does the critical slowing down bite back during learning?

Nicoli et al., 2023

… The Problem

Local versus Global Updates

Local versus Global Updates

Local versus Global Updates

The HMC Algorithm

Recap the Metropolis MC Algorithm

Recap the Metropolis MC Algorithm (cont.)

Hybrid Monte Carlo (HMC)

Hybrid Monte Carlo (HMC)

Proof for Detailed Balance

Proof (cont.)

Proof (cont.)

Hybrid Monte Carlo (HMC)

Hybrid Monte Carlo (HMC)

Leap-Frog versus Runge-Kutta

Example \(\chi^2\) Fit

Example \(\chi^2\) Fit

Example \(\chi^2\) Fit

HMC for Lattice QCD

The Berlin Wall Plot

The Key Improvements

The Key Improvements

Hasenbusch Mass Preconditioning

Hasenbusch Mass Preconditioning

Multiple Time Scale Integration

Berlin Wall Plot again

Taming the Quark Mass Dependence

Local Coherence

Local Coherence

Local Coherence

A Multigrid Approach

Deflated Solver

Berlin Wall Plot (a last time)

Remaining Challenges

Critical Slowing Down

Outlook

Thank you!