1 A \(\chi^2\)-Fit by foot

1.1 \(\chi^2\) function

Assume you want to fit a (scalar) funciton \(f(t)\) to data \(y(t)\) at fixed and discrete \(t\)-values \(t_i\). Define the \(\chi^2\) function as follows \[\begin{equation} \chi^2(\lambda)\ =\ (\vec y -\vec f(t; \lambda))^t\, M\, (\vec y -\vec f(t, \lambda))\,. \end{equation}\] in matrix-vector notation with parameters \(\lambda\). In case of uncorrelated data, the matrix \(M\) is just a diagonal matrix with \[ M\ =\ \mathrm{diag}(\mathrm{err}_i^{-2})\,. \] If there is correlation to be taken into account, you need to set \[ M = C^{-1} \] with \(C\) the variance-covariance matrix. \[ C = \frac{1}{N}\Sigma_N^2 \] and \(\Sigma_N^2\) the covariance matrix. \(N\) is the sample size used to estimate the means and the variance-covariance matrix. I hope the notation is clear, otherwise ask me!

1.2 Example Uncorrelated Fit

Let us start by implementing an uncorrelated \(\chi^2\)-fit by foot just to understand it maybe a bit better. Assume we have data depending on some variable \(t\):

\(t\) \(y\) \(\Delta y\)
0 0.9417029 0.110
2 1.9538151 0.095
4 3.1536958 0.130

We assume that the data points for different \(t\)-values are independent. With \(\Delta y\) we denote the standard error. Let us fit for demonstrational purposes the linear model \[ f(t; a,b)\ =\ at + b \] to this data. For this we first need an implementation of the \(\chi^2\) function

We can minimise this function using the optimiser provided by R or any other optimiser, for instance based on the Levenberg-Marquardt method

This fit gives us a value for \(\chi^2 = 0.5415828\) and parameters \(a=0.549563\) and \(b=0.9067579\). Visualised, the result looks as follows

We can also compute the \(p\)-value for this fit

which results in \(p=0.4617775\), so not too bad. The \(p\)-value is defined as \[ p = 1-F_{\chi^2}(\mathrm{dof}, \chi^2)\ =\ 1 - \Gamma\left(\frac{\mathrm{dof}}{2}, \frac{\chi^2}{2}\right) \] The \(p\)-value represents the probability for finding a larger \(\chi^2\) value in a next, independent statistical experiment under the assumption that the model is correct. It is easy to see that you expect a value of \(p=0.5\) for a good fit.

2 Bootstrapping a \(\chi^2\)-Fit

Lets now turn towards computing errors of the best-fit parameters. In accordance with the assumption that the data points are uncorrelated, we can use parametric bootstrap here. This means we can generate bootstrap samples from mean and standard error by drawing from corresponding Gaussian distributions:

Next we repeat the fit on all the bootstrap samples. We will record the best fit parameters of all these fits along with the \(\chi^2\)-values. We will also use the frozen covariance matrix approach, i.e. we keep the errors fixed during the bootstrap procedure.

This leads to the following bootstrap estimates for the bootstrap errors on the best fit parameters and their bootstrap errors:

\(\theta_i\) \(\Delta\theta_i\) Bias\((\theta_i)\)
0.5495630 0.0427179 -0.0001973
0.9067579 0.0998024 0.0005678

Errors are of the order of 10%. Finally, how are the \(\chi^2\)-values of the bootstrap fits distributed?

Does this correspond to a \(\chi^2\) distribution with one degree of freedom? We can review this using a QQ-plot

Apart from the bending at large quantiles due to unsufficient statistics, the scales do not seem to match. The reason for this is that we do not expect a central \(\chi^2\) distribution: the true fit on the original data has a \(\chi^2\)-value of \(0.5415828\), thus, we expect the bootstrap fit to build a non-central \(\chi^2\)-distribution around this value.

Now the quantiles are are lined up on the bisecting line as one would expect. The last plot is a very important check for whether or not all the fits performed during the bootstrap procedure actually make sense.

Finally, lets check on the distribution of the bootstrap values of the two parameters

Finally, we can also produce an error-band for our fitted line. We can use the bootstrap samples for the fit parameters to compute the error at each value of \(t\):

Note that the function ‘plotwitherror’ was taken fromthe hadron package https://github.com/HISKP-LQCD/hadron .

3 Fit with \(x\)- and \(y\)-errors

So far we have assumed that only the \(x_i\) are random variables but the explanatory variable \(t_i\) is fixed to certain values. This is clearly not the most general case and for many problems it is clearly not the case. Hence, we assume now to have likely not independent random variables \(Y_i\) and \(X_i\) for \(i=1,...,n\) and we join them into one random vector by defining \(\vec Z = (\vec X, \vec Y)\). Now we extend our parameters \(\lambda=(a, b)\) by \(n\) new parameters \(t_1, ..., t_n\) \[ \hat\lambda\ =\ (t_1, ..., t_n, \lambda) \] and define a new function \[ \vec f(\hat\lambda)\ =\ (t_1, ..., t_n, f(t_1;\, \lambda), ..., f(t_n;\, \lambda))\,. \] We will assume a linear model for the \(X_i\), which leads to the definition of a new \(\chi^2\) function \[\begin{equation} \chi^2(\hat\lambda)\ =\ (\vec z -\vec f(\hat\lambda))^t\, M\, (\vec z -\vec f(\hat\lambda))\,. \end{equation}\] For the minimisation, the parameters \(\lambda\) and the parameters \(t_i\) can be adjusted. Note that the number of degrees of freedom did not change when the number of parameters was increased, because the number of data points increased by the same amount. The matrix \(M\) is defined like above, but of course for the random vector \(\vec Z\). Therefore, \(M\) is now a \(2n\times2n\) matrix.

4 Exercise

  1. Implement a linear fit as above to the following data with \(t\)- and \(y\)-errors
\(t\) \(\Delta t\) \(y\) \(\Delta y\)
-0.1 0.23 0.9417029 0.110
2.2 0.18 1.9538151 0.095
4.3 0.25 3.1536958 0.130

assuming again independent errors. If you plot the data, it should look as follows

  1. Bootstrap the fit using parametric resampling and determine the errors on the best fit parameters. Check the bootstrap distribution of the errors.

  2. Finally, introduce correlation between \(t\)- and \(y\)-errors assuming a cross-correlation of \(0.35\), \(0.25\) and \(0.30\) for the three points, respectively. The three points are independent. Repeat the fitting procedure, this time applying a fully correlated fit

  3. Can you see how to include prior knowledge with errors for fit parameters into the fit?