McNemar's Test = Measuring Twice - Categorical

McNemar's test is a statistical test used to analyze paired categorical data. It is commonly used when you have a 2x2 table, where each subject is measured twice (before and after an intervention or under two different conditions). The purpose of McNemar's test is to determine if there is a significant difference in the proportions of two related categorical variables.



In the context of the husband and wife voting habits example, let's say you have two categorical variables: "Husband's Vote" (before the intervention) and "Wife's Vote" (after the intervention). Each couple has been surveyed twice, and the responses fall into one of four categories in the 2x2 table:

1. Both Husband and Wife voted "Yes."

2. Both Husband and Wife voted "No."

3. Husband voted "Yes" and Wife voted "No."

4. Husband voted "No" and Wife voted "Yes."

The null hypothesis in McNemar's test is that there is no significant difference in the proportions of the two variables, meaning the intervention or condition didn't have any effect on the voting habits of the husband and wife. The alternative hypothesis is that there is a significant difference, suggesting that the intervention had an impact.

You use McNemar's test to calculate a chi-square statistic from the 2x2 table, and then compare it to a chi-square distribution with one degree of freedom (because the data is paired) to determine if the difference in proportions is statistically significant.

So, in conclusion, your statement was almost correct. McNemar's test is indeed used to compare proportions and determine if the distribution of the categorical variables is significantly different, but it specifically applies to paired data, like the husband and wife voting habits in your example.

The SAS code is from the book titled Categorical Data Analysis Using SAS Third Edition.

SAS code

The following SAS code performs McNemar's Test on a dataset named 'approval'. The dataset contains three variables: 'hus_resp', 'wif_resp', and 'count'. The 'hus_resp' and 'wif_resp' variables represent the responses of the husband and wife, respectively, and can be 'yes' or 'no'. The 'count' variable represents the frequency of each response combination.


data approval;
input hus_resp $ wif_resp $ count;
datalines;
yes yes 20
yes no 5
no yes 10
no no 10
;

ods select McNemarsTest;
Proc Freq Order=Data;
weight count;
tables hus_resp*wif_resp / agree;
Run;

Python code

The equivalent Python code uses the pandas library to create a DataFrame from the data and the statsmodels library to perform McNemar's Test. The 'mcnemar' function requires a contingency table, which is created using the 'pd.crosstab' function.


import pandas as pd
from statsmodels.sandbox.stats.runs import mcnemar

# Define the data
data = {
    'hus_resp': ['yes', 'yes', 'no', 'no'],
    'wif_resp': ['yes', 'no', 'yes', 'no'],
    'count': [20, 5, 10, 10]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Create a contingency table
contingency = pd.crosstab(df['hus_resp'], df['wif_resp'], df['count'], aggfunc=sum)

# Perform McNemar's test
result = mcnemar(contingency, exact=True)

print('statistic=%.3f, p-value=%.3f' % (result[0], result[1]))

Interpretation 

If the p-value is less than the significance level (typically 0.05), you would reject the null hypothesis and conclude that there is a significant difference between the two variables. In your context, this could mean that there is a significant difference in the responses of husbands and wives. If the p-value is greater than the significance level, you would fail to reject the null hypothesis and conclude that there is not a significant difference between the two variables. In your context, this could mean that there is no significant difference in the responses of husbands and wives. The test statistic itself gives the value of the test under the null hypothesis. It is used to compute the p-value. The larger the statistic (in absolute value), the greater the evidence against the null hypothesis.

For this example, p>.05 so we are required to accept the null hypothesis and state the intervention did not have a significant impact on the couples switching from a yes to a no or a no to a yes response.  

Comments

Popular posts from this blog

Statistical Journey