In statistics, a confidence interval (CI) is a range of values that is likely to contain the true value of a parameter with a certain level of confidence. For example, a 95% CI means that 95% of the time, the true value of the parameter will fall within the interval.
One of the parameters
that we often want to estimate is the variance or the standard deviation of a
population. The variance measures how spread out the data are around the mean,
and the standard deviation is the square root of the variance. Both are useful
indicators of variability and dispersion in a data set.
However, in most
cases, we do not have access to the entire population, but only to a sample.
Therefore, we need to use sample statistics to estimate the population
parameters. The sample variance and the sample standard deviation are
calculated as follows:
s$^{2}$ = (1/(n-1)) * sum((x$_i$ - m)$^{2}$)
for i = 1 to n
s = sqrt(s$^{2}$)
where n is the sample
size, x$_i$ are the sample observations, and m is the sample mean.
The problem is that
these sample statistics are not exact estimates of the population parameters.
They vary from sample to sample, depending on how representative the sample is
of the population. Therefore, we need to construct confidence intervals to
quantify the uncertainty in our estimates.
One way to construct
a confidence interval for the population variance is to use the chi-square
distribution. The chi-square distribution is a continuous probability distribution
that has one parameter: the degrees of freedom (df). The degrees of freedom are
related to the sample size as follows:
df = n - 1
The chi-square
distribution has some important properties:
- It is skewed to the
right and has a minimum value of zero.
- It has a mean equal
to its degrees of freedom: E(X) = df
- It has a variance
equal to twice its degrees of freedom: Var(X) = 2 * df
The chi-square
distribution can be used to construct a confidence interval for the population
variance because it has a special relationship with the sample variance. If we
divide the sample variance by the population variance and multiply by the
degrees of freedom, we get a random variable that follows a chi-square
distribution:
((n-1) * s$^{2}$) /
sigma$^{2}$ ~ chi-square(df)
where sigma$^{2}$ is the
population variance.
This means that we
can use the percentiles of the chi-square distribution to find the lower and
upper bounds of our confidence interval. For example, if we want a 95% CI, we
need to find the values of chi-square that correspond to 0.025 and 0.975
probabilities:
P(chi-square(df)
< chi-square(0.025)) = 0.025
P(chi-square(df)
> chi-square(0.975)) = 0.025
Then, we can
rearrange the equation above to solve for sigma^2:
sigma$^{2}$ > ((n-1) * s$^{2}$) / chi-square(0.025)
sigma$^{2}$ < ((n-1) * s$^{2}$) / chi-square(0.975)
This gives us the
upper and lower limits of our confidence interval for sigma$^{2}$. To find the
confidence interval for sigma (the standard deviation), we simply take the
square root of both sides:
sigma
< sqrt(((n-1) * s$^{2}$) / chi-square(0.025))
sigma
> sqrt(((n-1) * s$^{2}$) / chi-square(0.975))
Note that because of
the square root transformation, the confidence interval for sigma is not
symmetric around s.
To illustrate this
method, let's consider an example. Suppose we have a sample of 10 observations
from a normal population with unknown variance and standard deviation:
x = (12, 15, 18, 20,
22, 24, 25, 27, 28, 30)
The sample mean and
sample standard deviation are:
m = 22.1
s = 5.84
We want to construct
a 95% CI for sigma$^{2} and sigma. First, we need to find the degrees of freedom
and the chi-square values:
df = n
- 1 = 10 - 1 = 9
chi-square(0.025)
= 19.02
chi-square(0.975)
= 2.70
Then, we plug these
values into the formulas above to get the bounds of the 95% confidence interval
for the variance:
sigma$^{2}$ > ((9 * 5.84^2)
/ 19.02) = 16.14
sigma$^{2}$ < ((9 * 84^2)
/ 2.70) = 113.68.
Hence, the bounds of
the 95% confidence interval for the standard deviation sigma will be:
sigma >
sqrt(16.14) = 4.02
sigma <
sqrt(113.68) = 10.66
We can apply the R function ci_sigma to have prompt results.
No comments:
Post a Comment