Tuesday, 11 April 2023

Confidence interval for prevalence studies

Prevalence studies are a type of epidemiological research that aims to estimate the proportion of a population that has a certain condition or characteristic. For example, a prevalence study might investigate how many people have diabetes, hypertension, or COVID-19 in a given region or country.

Prevalence studies are useful for describing the burden of disease, identifying risk factors, and planning health interventions. However, prevalence estimates are not exact values, but rather point estimates that are subject to sampling error and other sources of uncertainty. Therefore, it is important to quantify the precision of prevalence estimates by calculating confidence intervals. 

A confidence interval is a range of values that is likely to contain the true population parameter (in this case, the prevalence) with a certain level of confidence (usually 95%). A narrower confidence interval indicates a more precise estimate, while a wider confidence interval indicates more uncertainty. 

There are different methods for calculating confidence intervals for prevalence studies, depending on the assumptions and characteristics of the data. In this blog post, we will review some of the most common methods and their advantages and disadvantages.

Wald interval


The Wald interval (also known as the normal approximation interval or the asymptotic interval) is based on the assumption that the sampling distribution of the prevalence estimate is approximately normal. The formula for the Wald interval is:

p ± z * sqrt(p * (1 - p) / n)

where p is the prevalence estimate, z is the critical value from the standard normal distribution (usually 1.96 for 95% confidence), and n is the sample size.

The Wald interval is simple and easy to calculate, but it has some drawbacks:

  1. it can produce illogical results when p is close to 0 or 1, such as negative lower bounds or upper bounds greater than 1. 
  2. It can be inaccurate when n is small or p is extreme, as the normal approximation may not hold.
  3. It can be too narrow and fail to cover the true prevalence with the desired confidence level.

Agresti-Coull interval

The Agresti-Coull interval (also known as the adjusted Wald interval) is a modification of the Wald interval that adds two successes and two failures to the observed data before calculating the interval. The formula for the Agresti-Coull interval is:

(P ± z * sqrt(P* (1 - P) / N))

where P = (x + 2) / (n + 4), x is the number of observed successes, N = n + 4, and z and n are as before.

The Agresti-Coull interval is also simple and easy to calculate, and it has some advantages over the Wald interval. First, it avoids illogical results by ensuring that p* is always between 0 and 1. Second, it improves the accuracy and coverage of the interval by adjusting for the bias and variability of p. Third, it performs well for a wide range of n and p values.

Exact interval

The exact interval (also known as the Clopper-Pearson interval) is based on the binomial distribution, which models the number of successes in a fixed number of trials with a constant probability of success. The formula for the exact interval is:

 

(lower bound = F$^{-1}$(alpha/2; x, n - x + 1))

(upper bound = F$^{-1}$(1 - alpha/2; x + 1, n - x))

where F$^{-1}$ is the inverse cumulative distribution function of the beta distribution, alpha is the significance level (usually 0.05 for 95% confidence), x and n are as before.

The exact interval is more complex and difficult to calculate than the previous methods, but it has some benefits:

  1. it does not rely on any approximation or assumption about the sampling distribution of p. 
  2. It guarantees that the interval covers the true prevalence with at least the desired confidence level. 
  3. It works well for small n and extreme p values.

However, the exact interval also has some drawbacks. First, it can be too conservative and produce unnecessarily wide intervals. Second, it can be asymmetric and skewed around p. Third, it can be discontinuous and non-monotonic as n or x change.

Summary


In summary, confidence intervals are essential for quantifying the precision and uncertainty of prevalence estimates in epidemiological studies. There are different methods for calculating confidence intervals for prevalence studies, each with its own strengths and limitations. The choice of method depends on various factors such as sample size, expected prevalence, data quality, computational resources, and research objectives.


References

Reiczigel J., Földi J., Ózsvári L., (2010). Exact confidence limits for prevalence of a disease with an imperfect diagnostic test. Epidemiology & Infection, 138(11), 1674-1678. doi:10.1017/S0950268810000385

https://eu-rd-platform.jrc.ec.europa.eu/sites/default/files/Calculations%20of%20Prevalence%20and%20CIs.pdf

Naing, L., Nordin, R.B., Abdul Rahman, H. et al. Sample size calculation for prevalence studies using Scalex and ScalaR calculators. BMC Med Res Methodol 22, 209 (2022). https://doi.org/10.1186/s12874-022-01694-7



No comments:

Post a Comment

Understanding Anaerobic Threshold (VT2) and VO2 Max in Endurance Training

  Introduction: The Science Behind Ventilatory Thresholds Every endurance athlete, whether a long-distance runner, cyclist, or swimmer, st...