The Chi-Square Test of Independence is a statistical tool used to evaluate relationships between two categorical variables. Unlike Analysis of Variance (ANOVA), which examines relationships involving a categorical explanatory variable and a quantitative response variable, the Chi-Square Test is specifically designed for categorical data. The test assesses whether the observed distribution of data differs significantly from what would be expected under the null hypothesis, which assumes no association between the variables.
Real-World Example: Gender and Drunk Driving
A notable example involves a challenge to an Oklahoma law in the 1970s that restricted the sale of 3.2% beer to males under 21 while allowing it for females of the same age. To justify the law, data from a random roadside survey of 619 drivers under 20 years of age were presented, categorizing drivers by gender and whether they had consumed alcohol within the previous two hours. The data were summarized in a two-way table to determine if there was a relationship between gender and drunk driving.
Setting Up the Hypotheses for Chi-Square
In this case, the null hypothesis (H0H_0H0) asserts that there is no relationship between gender and drunk driving—indicating that the variables are independent. The alternative hypothesis (HaH_aHa) suggests that there is a relationship, meaning the variables are not independent. Algebraically, independence would imply that the proportion of male drunk drivers is equal to the proportion of female drunk drivers.
Observed vs. Expected Counts
The observed counts represent the actual data collected, while the expected counts are calculated based on the assumption that the null hypothesis is true. The expected counts for each cell in the table are determined using the formula:

For example, the expected count for male drunk drivers is calculated by multiplying the total number of males and the total number of drunk drivers, then dividing by the total number of drivers. This calculation is repeated for each cell to produce a table of expected counts.
Calculating the Chi-Square Statistic
The Chi-Square statistic (X2X^2X2) quantifies the overall difference between observed and expected counts. It is computed as follows:

For each cell, the squared difference between observed and expected counts is divided by the expected count, and the results are summed across all cells. A larger X2X^2X2 value indicates greater discrepancy from the null hypothesis.
Interpreting the Results
For this example, the calculated Chi-Square statistic is compared to a critical value of 3.84 (appropriate for a 2×2 table). If the statistic exceeds this threshold, the null hypothesis is rejected. However, in this case, the test statistic is not large enough to reject the null hypothesis, indicating that the observed data do not differ significantly from the expected values.
The Role of the p-Value
The p-value provides a probability measure of observing a Chi-Square statistic as extreme as, or more extreme than, the one calculated, assuming the null hypothesis is true. For this test, the p-value is 0.201. Since this value is not smaller than the typical significance threshold (e.g., 0.05), there is insufficient evidence to reject the null hypothesis. This suggests that gender and drunk driving may be independent.
Implications of the Findings
The lack of a significant relationship between gender and drunk driving undermines the justification for the Oklahoma law. Consequently, the U.S. Supreme Court struck down the law as discriminatory and unjustified. This example highlights the utility of the Chi-Square Test of Independence in assessing categorical data relationships and guiding policy decisions.
More Articles

1. Data Collection in Today’s World
In today’s digital era, data collection has become an essential process across industries, driving decision-making and innovation. With the exponential...
Learn More >