Wednesday, 12 April 2023

The confidence interval of variance and standard deviation

 In statistics, a confidence interval (CI) is a range of values that is likely to contain the true value of a parameter with a certain level of confidence. For example, a 95% CI means that 95% of the time, the true value of the parameter will fall within the interval.

 

One of the parameters that we often want to estimate is the variance or the standard deviation of a population. The variance measures how spread out the data are around the mean, and the standard deviation is the square root of the variance. Both are useful indicators of variability and dispersion in a data set.

 

However, in most cases, we do not have access to the entire population, but only to a sample. Therefore, we need to use sample statistics to estimate the population parameters. The sample variance and the sample standard deviation are calculated as follows:

 

s$^{2}$ = (1/(n-1)) * sum((x$_i$ - m)$^{2}$) 

for i = 1 to n

 

s = sqrt(s$^{2}$)

 

where n is the sample size, x$_i$ are the sample observations, and m is the sample mean.

 

The problem is that these sample statistics are not exact estimates of the population parameters. They vary from sample to sample, depending on how representative the sample is of the population. Therefore, we need to construct confidence intervals to quantify the uncertainty in our estimates.

 

One way to construct a confidence interval for the population variance is to use the chi-square distribution. The chi-square distribution is a continuous probability distribution that has one parameter: the degrees of freedom (df). The degrees of freedom are related to the sample size as follows:

 

df = n - 1

 

The chi-square distribution has some important properties:

 

- It is skewed to the right and has a minimum value of zero.

- It has a mean equal to its degrees of freedom: E(X) = df

- It has a variance equal to twice its degrees of freedom: Var(X) = 2 * df

 

The chi-square distribution can be used to construct a confidence interval for the population variance because it has a special relationship with the sample variance. If we divide the sample variance by the population variance and multiply by the degrees of freedom, we get a random variable that follows a chi-square distribution:

 

((n-1) * s$^{2}$) / sigma$^{2}$ ~ chi-square(df)

 

where sigma$^{2}$ is the population variance.

 

This means that we can use the percentiles of the chi-square distribution to find the lower and upper bounds of our confidence interval. For example, if we want a 95% CI, we need to find the values of chi-square that correspond to 0.025 and 0.975 probabilities:

 

P(chi-square(df) < chi-square(0.025)) = 0.025

 

P(chi-square(df) > chi-square(0.975)) = 0.025

 

Then, we can rearrange the equation above to solve for sigma^2:

 

sigma$^{2}$ > ((n-1) * s$^{2}$) / chi-square(0.025)

 

sigma$^{2}$ < ((n-1) * s$^{2}$) / chi-square(0.975)

 

This gives us the upper and lower limits of our confidence interval for sigma$^{2}$. To find the confidence interval for sigma (the standard deviation), we simply take the square root of both sides:

 

sigma < sqrt(((n-1) * s$^{2}$) / chi-square(0.025))

 

sigma > sqrt(((n-1) * s$^{2}$) / chi-square(0.975))

 

Note that because of the square root transformation, the confidence interval for sigma is not symmetric around s.

 

To illustrate this method, let's consider an example. Suppose we have a sample of 10 observations from a normal population with unknown variance and standard deviation:

 

x = (12, 15, 18, 20, 22, 24, 25, 27, 28, 30)

 

The sample mean and sample standard deviation are:

 

m = 22.1

 

s = 5.84

 

We want to construct a 95% CI for sigma$^{2} and sigma. First, we need to find the degrees of freedom and the chi-square values:

 

df = n - 1 = 10 - 1 = 9

 

chi-square(0.025) = 19.02

 

chi-square(0.975) = 2.70

 

Then, we plug these values into the formulas above to get the bounds of the 95% confidence interval for the variance:

sigma$^{2}$ > ((9 * 5.84^2) / 19.02) = 16.14

sigma$^{2}$ < ((9 * 84^2) / 2.70) = 113.68.

Hence, the bounds of the 95% confidence interval for the standard deviation sigma will be:

sigma > sqrt(16.14) = 4.02

sigma < sqrt(113.68) = 10.66


We can apply the R function ci_sigma to have prompt results. 


No comments:

Post a Comment

Understanding Anaerobic Threshold (VT2) and VO2 Max in Endurance Training

  Introduction: The Science Behind Ventilatory Thresholds Every endurance athlete, whether a long-distance runner, cyclist, or swimmer, st...