Molly McPup

Introduction to Research Methods in Political Science:
The POWERMUTT* Project
(for use with SPSS)

*Politically-Oriented Web-Enhanced Research Methods for Undergraduates — Topics & Tools
Resources for introductory research methods courses in political science and related disciplines




SPSS Tools



This topic discusses the use of control variables in analyzing contingency tables.  The process of introducing one or more control variables into such analysis is sometimes called elaboration because it allows us to “elaborate,” or expand upon, the relationship between two variables by investigating how that relationship is influenced by other variables.

The fact that two variables in a table are related does not necessarily mean that one is a cause of the other, even if the relationship is statistically significant and therefore unlikely to be due to chance.  Broadly speaking, there are four possible patterns that can result when a third variable is introduced into a relationship between two other variables.  (Since the examples we will use in this topic to illustrate the elaboration model employ real data, they will not fit any one pattern in pure form.)

To introduce a control variable into a relationship displayed in a contingency table, the original table is broken down into two or more subtables, one for each value of the control variable.  For example, if we control (as we will below) for region in examining the relationship between party identification and vote, we will have one subtable for each region.  For each subtable, as well as for the original table, we will want to test for statistical significance and for the strength of the relationships.  (Note: Because we are breaking one table down into two or more subtables, the number of cases in each subtable will be smaller than in the original table, and the relationships will tend to be less significant even when the degree of association is unchanged.  If we have too few cases in some categories of the control variable, introducing a control variable may have little effect on the strength of the relationship, but cause the relationship to become statistically insignificant.  If this happens, consider recoding the control variable into fewer categories.)

It is possible to control for two or more variables simultaneously.  For example, we could control for both region and religion.  This would result in a separate subtable for each combination of values of the control variables (Southern Protestants, Southern Catholics, etc.)  The problem with doing this, in addition to complexity, is that for at least some of the subtables, there will likely not be enough cases to permit reliable analysis.

In each example used in this topic, we will first look at a table showing the relationship between the dependent and independent variables. We will then repeat the process, but break the table down by the values of the control variable. For a fuller understanding of how the three variables are related, it would also be helpful to examine separately crosstabs between the control variable and each of the other two. For reasons of space, we won't do this, but you are encouraged to try it on your own.

A Note Regarding Statistical Measures

In choosing measures of association and significance in conjunction with a crosstabulation using a control variable, what counts is the level of measurement of the independent and dependent variables, not that of the control variable.  For example, if you are crosstabulating two ordinal variables and using a nominal level control variable, choose Kendall's tau.


The following table shows the relationship between voting in 2008 elections to the U.S. House of Representatives and party identification.  (The data are from the 2008 American National Election Study Subset, and are weighted using the “weight” variable.)  Not surprisingly, there is a very strong relationship between the two.


Crosstab of house vote by party ID


The next table breaks this same relationship down by region.  Introducing a control for region has little effect on the relationship between the two variables (though the relationship is a little weaker in the West).  The overall pattern is “replicated” within each region of the county. 

Pop Up Protocol (PUP) button  

Crosstab of house vote by party ID, controlling for region



In 1996, Thomas Friedman proposed the “Golden Arches Theory of Conflict Prevention,” noting that “no two countries that both have a McDonald's have ever fought a war against each other.”[1]  Friedman was not really suggesting that universal peace could be achieved simply by placing McDonald’s franchises in every country, but rather was arguing that economic development encourages both peace and the creation of establishments such as McDonald’s.  In other words, he was hypothesizing that the independent variable (the presence or absence of McDonald’s) and the dependent variable (war or peace) are spuriously related — that one does not cause the other, but that both are products of economic development, and that the control variable, economic development, “explains” their relationship. (Note, however, that, when war broke out between Serbia and NATO forces in 1999, McDonald’s outlets in Belgrade were among the casualties.)

In the 2008 American National Election Study, respondents were asked a series of questions designed to measure political efficacy (the belief that one is able to have an impact on political events). From these questions, a scale has been created to classify respondents as "high," "medium," or "low" in efficacy. (See the codebook for the specific measures from which this scale has been constructed.) In the following table, this new variable has been crosstabulated with respondents' household income. Data are again weighted by the “weight” variable.


Crosstab of political efficacy by income


The results, again not surprisingly, show that the higher one's income, the higher ones level of political efficacy. One explanation for this pattern might be simply that "money talks," and so those with more money see themselves as having more political clout. Another possibility, however, is that education tends to lead both to higher income and, regardless of income level, to increased efficacy (for example, to the belief that the respondent understands politics and is able to act on that understanding.) In that case, the relationship between efficacy and income might be spurious.

To test this, we can introduce a control for education.  If we do, we obtain these results: 


Crosstab of political efficacy by income, controlling for education



We can see that, while some differences remain, the relationship between efficacy and income, within each category of education, is much weaker (with the Kendall's taub statistic reduced from .161 to a range of .045 to .086), with none of the relationships being statistically significant.  The original relationship, in other words, is partly spurious, and can in substantial part (though not completely) be explained by education level.


The tables below show the relationship (using data from senate.sav) between a measure of liberalism/conservatism in the voting records of members of the U.S. Senate crosstabulated with members' gender. The measure was derived by taking a commonly used scale developed by poltical scientists Lewis and Poole, and dividing members into two equally sized groups.  The results show that, by this measure, 80 percent of female senators, but fewer than half of male senators, were classified as liberals.


Crosstabulation of Voting Record in Senate by gender


If we control for members’ political party, however, we see that the relationship is substantially weakened. Regardless of gender, all but two Democrats were among the more liberal half of senate members. Among Republicans all members, male and female, scored in the more conservative half. In fact, no measure of association is calculated for Republicans because, by the measure we are using, conservatism is a constant, not a variable, among GOP members. In other words, while female members of the senate were more likely to be Democrats than were males, there was little difference between females and males of the same party.


Crosstabulation of Voting Record in Senate by gender, controlling for party


In both explanation and interpretation, introducing a control variable reduces or eliminates the association between the independent and dependent variables.  The difference between explanation and interpretation has to do with the sequencing of the independent and control variables.  In the former case, the control variable is antecedent to (that is, comes before) the independent variable.  The independent and dependent variables are related because both are dependent on the control variable, not because either one is a cause of the other.  In the latter, the control variable is an intervening variable (that is, one that comes between the independent and dependent variables in a causal sequence).  The independent variable does have an effect on the dependent variable, but does do through the control variable.  


Sometimes the relationship between an independent and dependent variables will depend on the value of the control variable.  Consider, for example, the relationship between voting in 2008 elections to the U.S. House of Representatives and ideology.  The following table (with data taken from the 2008 American National Election Study) shows a strong relationship.


Crosstab of house vote by ideology


The relationship is, however, very different for different ethnic groups. Conservative African Americans were almost as likely as liberal and moderate African Americans to vote for Democratic candidates. On the other hand, among whites and, to a somewhat lesser degree, Latinos, ideology is a good predictor of how a respondent voted. In other words, one needs to specify ethnicity in order to understand the relationship between ideology and vote.  (Note: other racial/ethnic groups were very few in number in the sample, and so they were not included in the analysis.) 


Crosstab of peridential vote by ideology, controlling for race


Key Concepts

antecedent variable
control variable
intervening variable


For each of the following exercises, describe and interpret the results.  In each case, do the resulting patterns more closely resemble replication, explanation, interpretation, or specification.

Start SPSS.  For exercises 1 through 4, open anes08s.sav and the 2008 American National Election Study Subset codebook.  

1. Compare the relationship between income and party identification among black respondents with the same relationship among whites and among Latinos.  Make these same comparisons, but substitute education for income.   You will need to recode income into a smaller number of categories. Also, because of limitations in sample size, use select cases to limit your analysis to black, white and Latino respondents. (Be sure to change this back before proceeding to other exercises.) 

2.  Is the relationship between attitude toward government funding of abortions and attendance at religious services different for Protestants than it is for Catholics?  You will need to recode both your independent and dependent variables into fewer categories. Also, there aren't enough people of other religions, or of church-going atheists, in the sample to permit reliable analysis.  Use select cases to limit your analysis to Protestant and Catholic respondents, again being sure to change this back before proceeding to other exercises.

3.  Is the relationship between voting in the 2008 presidential election and attitude toward social security different for respondents in different age categories?  You will need to recode age before doing this analysis.

4. Does a person's level of education influence the strength of the relationship between ideology and party identification?   If so, why? You will need to recode ideology and education into fewer categories.

5. This exercise uses the anes04s.sav file. Open the file in SPSS, and open the American National Election Study 2004 Subset codebook.  Examine the codebook for measures of the gender of the respondent (gender), the gender of the interviewer (intgenpr), and the question of whether "a working mother can establish just as warm and secure a relationship with her children as a mother who does not work" (workmom).  Obtain frequency distributions for all three variables. Crosstabulate “workmom” by “gender” and by “intgenpr.”  Repeat the crosstabulation between "workmom" and "gender," controlling for "intgenpr." Do so again, but this time use "intgenpr" as the independent variable and "gender" as the control. Is each independent variable important?  Did the fact that interviewers were predominantly female influence the frequency distribution for the question on working mothers?

For Further Study

Nelson, Elizabeth N., and Edward E. Nelson, “Introducing a Control Variable (Multivariate Analysis),” California Opinions on Women's Issues -- 1985-1995  April 29, 2013.

Sanborn, John, "Multi-variate Analysis: The Elaboration Model," SW 3120: Data Analysis for Social Work Practice (Middle Tennessee State University) Accessed April 29, 2013.

[1]  Thomas L. Friedman, “Foreign Affairs Big Mac I,” New York Times, December 8, 1996 .  Lexis-Nexis Academic Universe. 



Last updated April 28, 2013 .
© 2003---2013  John L. Korey.  Licensed under a  Description: Creative Commons License Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.