Assume you want to fit a (scalar) funciton \(f(t)\) to data \(y(t)\) at fixed and discrete \(t\)-values \(t_i\). Define the \(\chi^2\) function as follows \[\begin{equation} \chi^2(\lambda)\ =\ (\vec y -\vec f(t; \lambda))^t\, M\, (\vec y -\vec f(t, \lambda))\,. \end{equation}\] in matrix-vector notation with parameters \(\lambda\). In case of uncorrelated data, the matrix \(M\) is just a diagonal matrix with \[ M\ =\ \mathrm{diag}(\mathrm{err}_i^{-2})\,. \] If there is correlation to be taken into account, you need to set \[ M = C^{-1} \] with \(C\) the variance-covariance matrix. \[ C = \frac{1}{N}\Sigma_N^2 \] and \(\Sigma_N^2\) the covariance matrix. \(N\) is the sample size used to estimate the means and the variance-covariance matrix. I hope the notation is clear, otherwise ask me!
Lets now turn towards computing errors of the best-fit parameters. In accordance with the assumption that the data points are uncorrelated, we can use parametric bootstrap here. This means we can generate bootstrap samples from mean and standard error by drawing from corresponding Gaussian distributions:
boot.R <- 1500
n <- length(x$y)
Y <- array(NA, dim=c(1500, n))
for(i in c(1:n)) {
Y[, i] <- rnorm(n=boot.R, mean=x$y[i], sd=x$dy[i])
}
Next we repeat the fit on all the bootstrap samples. We will record the best fit parameters of all these fits along with the \(\chi^2\)-values. We will also use the frozen covariance matrix approach, i.e. we keep the errors fixed during the bootstrap procedure.
par <- c(1,0)
bpars <- array(NA, dim=c(boot.R, length(par)+1))
for(r in c(1:boot.R)) {
tmp <- optim(par=par, fn=chisqr, y=Y[r,], dy=x$dy, t=x$t)
## best fit parameters
bpars[r, c(1:length(par))] <- tmp$par
## chisqr-value
bpars[r, length(par)+1] <- tmp$value
}
This leads to the following bootstrap estimates for the bootstrap errors on the best fit parameters and their bootstrap errors:
\(\theta_i\) | \(\Delta\theta_i\) | Bias\((\theta_i)\) |
---|---|---|
0.5495630 | 0.0427179 | -0.0001973 |
0.9067579 | 0.0998024 | 0.0005678 |
Errors are of the order of 10%. Finally, how are the \(\chi^2\)-values of the bootstrap fits distributed?
Does this correspond to a \(\chi^2\) distribution with one degree of freedom? We can review this using a QQ-plot
s <- seq(0, 1, 1/boot.R)
csq <- qchisq(p = s, df = 1)
qqplot(x=csq, y=bpars[,3], xlab="Theoretical Quantiles",
ylab = "Sample Quantiles")
Apart from the bending at large quantiles due to unsufficient statistics, the scales do not seem to match. The reason for this is that we do not expect a central \(\chi^2\) distribution: the true fit on the original data has a \(\chi^2\)-value of \(0.5415828\), thus, we expect the bootstrap fit to build a non-central \(\chi^2\)-distribution around this value.
s <- seq(0, 1, 1/boot.R)
csq <- qchisq(p = s, df = 1, ncp=myfit$value)
qqplot(x=csq, y=bpars[,3], xlab="Theoretical Quantiles",
ylab = "Sample Quantiles")
Now the quantiles are are lined up on the bisecting line as one would expect. The last plot is a very important check for whether or not all the fits performed during the bootstrap procedure actually make sense.
Finally, lets check on the distribution of the bootstrap values of the two parameters
Finally, we can also produce an error-band for our fitted line. We can use the bootstrap samples for the fit parameters to compute the error at each value of \(t\):
plot(NA, xlim=c(0, 4), ylim=c(0.8,3.5), xlab="t", ylab="y")
mx <- seq(0, 4, 0.01)
## best fit line
my <- myfit$par[1]*mx + myfit$par[2]
Y <- t(apply(bpars[,c(1:2)], MARGIN=1, FUN=function(par, x){par[1]*x + par[2]}, x=mx))
dY <- apply(Y, MARGIN=2, FUN=sd)
polygon(x=c(mx, rev(mx)), y=c(my+dY, rev(my-dY)), col="gray", border="gray")
lines(x=mx, y=my)
plotwitherror(x=x$t, y=x$y, dy=x$dy, col="red", bg="red", xlab="t", rep=TRUE)
Note that the function ‘plotwitherror’ was taken fromthe hadron package https://github.com/HISKP-LQCD/hadron .
So far we have assumed that only the \(x_i\) are random variables but the explanatory variable \(t_i\) is fixed to certain values. This is clearly not the most general case and for many problems it is clearly not the case. Hence, we assume now to have likely not independent random variables \(Y_i\) and \(X_i\) for \(i=1,...,n\) and we join them into one random vector by defining \(\vec Z = (\vec X, \vec Y)\). Now we extend our parameters \(\lambda=(a, b)\) by \(n\) new parameters \(t_1, ..., t_n\) \[ \hat\lambda\ =\ (t_1, ..., t_n, \lambda) \] and define a new function \[ \vec f(\hat\lambda)\ =\ (t_1, ..., t_n, f(t_1;\, \lambda), ..., f(t_n;\, \lambda))\,. \] We will assume a linear model for the \(X_i\), which leads to the definition of a new \(\chi^2\) function \[\begin{equation} \chi^2(\hat\lambda)\ =\ (\vec z -\vec f(\hat\lambda))^t\, M\, (\vec z -\vec f(\hat\lambda))\,. \end{equation}\] For the minimisation, the parameters \(\lambda\) and the parameters \(t_i\) can be adjusted. Note that the number of degrees of freedom did not change when the number of parameters was increased, because the number of data points increased by the same amount. The matrix \(M\) is defined like above, but of course for the random vector \(\vec Z\). Therefore, \(M\) is now a \(2n\times2n\) matrix.
\(t\) | \(\Delta t\) | \(y\) | \(\Delta y\) |
---|---|---|---|
-0.1 | 0.23 | 0.9417029 | 0.110 |
2.2 | 0.18 | 1.9538151 | 0.095 |
4.3 | 0.25 | 3.1536958 | 0.130 |
assuming again independent errors. If you plot the data, it should look as follows
Bootstrap the fit using parametric resampling and determine the errors on the best fit parameters. Check the bootstrap distribution of the errors.
Finally, introduce correlation between \(t\)- and \(y\)-errors assuming a cross-correlation of \(0.35\), \(0.25\) and \(0.30\) for the three points, respectively. The three points are independent. Repeat the fitting procedure, this time applying a fully correlated fit
Can you see how to include prior knowledge with errors for fit parameters into the fit?