Student[Statistics][ChiSquareIndependenceTest] Overview
overview of the chi-square independence test
Description
Example
Chi-Squared Independence Test is used when there are two categorical variables for a single population; it tests if the two variables are independent.
Requirements for using Chi-Squared Independence Test:
Here, the goal is to test if two attributes within a population are independent of one another.
The data provided are formatted as a Matrix with at least two rows and two columns. The rows represent the levels of one attribute and the columns represent the levels of the other attribute; the entries in the Matrix are counts of observations with the given combination of levels.
This test is performed within a single population.
The formula is: ∑i=1..Nj=1..KMi,j−Ei,j2Ei,j, where M is the matrix of observations, and E is the matrix of expected data, which is computed as: Ei,j=rowsumi⁢columnsumjmatrixsum.
In turn, rowsumi is computed as ∑j=1K⁡Mi,j; columnsumj is computed as ∑i=1N⁡Mi,j; and matrixsum is computed as ∑i=1..Nj=1..KMi,j.
where N is the sample size of the observed and the expected samples, and X2 follows a Chi-Squared distribution with N−1 degrees of freedom.
The number of students enrolled into the Math Faculty, Art Faculty, and Environment Faculty of a university is shown as follow:
Math
Art
Environment
Row total
Male
250
120
180
550
Female
150
300
600
Column total
400
420
330
1150
Now we want to test if there is a difference between preferences towards these three faculties from male students to female students.
Notice: The matrix we build up for the test for this case should be
250120180150300150
Determine the null hypothesis:
Null Hypothesis: Gender and preferences to these three faculties are independent.
Compare the expected data and the observed data:
Observed
Expected
O[1,1] = 250
E[1,1] = 550⋅400⁢1150 = 191.30435
O[1,2] = 120
E[1,2] = 550⋅420⁢1150 = 200.86957
O[1,3] = 180
E[1,3] = 550⋅330⁢1150 = 157.82609
O[2,1] = 150
E[2,1] = 600⋅400⁢1150 = 208.69565
O[2,2] = 300
E[2,2] = 600⋅420⁢1150 = 219.13043
O[2,3] = 150
E[2,3] = 600⋅330⁢1150 = 172.17391
Substitute the information into the formula:
x = 250 - 191.304352191.30435 + 120 - 200.869572200.86957 + 180 - 157.826092157.82609 +150 - 208.695652208.69565 +300 - 219.130432 219.13043+ 150 - 172.173912172.17391 = 102.891
Compute the p-value:
p-value = ProbabilityX2⁢>⁢102.891 = 0 (a small value very close to 0)
X2⁢˜⁢ChiSquare3.
Draw the conclusion:
This statistical test provides evidence that the null hypothesis is false, so we reject the null hypothesis.
See Also
Student[Statistics][ChiSquareIndependenceTest]
Download Help Document