Confidence Intervals
Main Concept
A confidence interval is a range of values that represent an estimate for an unknown population parameter. Confidence intervals often vary from sample to sample as they are calculated directly from observations. Some confidence intervals actually contain the true population parameter while some may not.
A confidence level gives the percentage of all samples that are expected to contain the population parameter. Common confidence levels that are used in statistics are 99%, 95% and 90%, which respectively correspond to a 99%, 95%, or 90% probability that the confidence interval contains the true population parameter.
To estimate a confidence interval for an unknown population parameter, such as the mean, an approximation for the population mean, μ, is carried out at first by using the sample mean, μˆ, as an estimator.
μˆ = X‾=1n∑i=1nX__i .
When taking several samples, each sample can produce different values for the sample mean, but most of the sample means should be relatively close to one another. The endpoints for the confidence interval can be determined by considering that the sample mean from a normally distributed sample is normally distributed, with a standard error of:
Standard Error = σn,
where σ is the standard deviation and n is the number of observations in our sample.
Then, standardizing the sample mean gives:
Z = X‾−μσn.
When standardizing the sample mean, μˆ, the mean, μ, is subtracted to center the sample mean, then divided by the standard error to scale the value. The resulting value is a so-called standard score, or z-score, Z, which corresponds to values in a standard normal distribution.
If there is a significance level of α = 0.05 (corresponding to a Confidence Level of 1 - α = 95% ), it can be used to determine values for -z and z which form the lower and upper endpoints for the confidence interval:
P−z≤Z≤z =1 − α = 0.95
The value z is derived from the cumulative normal distribution function, represented by Φ:
Φz = PZ≤z = 1 − α2,
z = Φ−1Φz=Φ−10.975 = 1.96
Substituting this back into the above:
P−z≤Z≤z =1 − α = 0.95,
P−1.96≤X‾−μσn≤1.96,
PX‾−1.96 σn≤μ≤X‾+1.96 σn.
From this, the endpoints can be determined:
Lower Endpoint = X‾−1.96 σn,
Upper Endpoint = X‾+1.96 σn.
Generalizing this, the mean confidence interval can be calculated in the following manner:
PX‾ − Zα/2 σn≤μ≤X‾+Zα/2 σn,
where α is the significance level, σ is the standard deviation, n is the sample size, and μ is the mean.
Example
Randomly sampling from a population with mean 0, the resulting graph below shows 50 realizations of a confidence interval for a given population mean 0 (as indicated by the blue line). Under these repeated samplings, it can be seen that for a significance level of α = 0.05 (which corresponds to a 95% confidence interval), 50−2=48 realizations ( 48/50 = 96% ) cover the mean, μ=0.
In the example below, it can be observed that when the value of the significance level goes down, the size of the confidence interval grows, meaning that more of the realizations cover the mean.
With the slider, choose a level of significance, a = 0.05, a = 0.04, a = 0.03, a = 0.02 or a = 0.01, to see how it affects the coverage of the mean:
a :
More MathApps
MathApps/ProbabilityAndStatistics
Download Help Document