Introduction to Research Methods in Political Science: |
VII. CONTINGENCY TABLE
ANALYSIS
Subtopics |
SPSS Tools
|
There’s a lot that a contingency
table can tell you, if you know the right questions to ask. How strong is the relationship shown in the table? What are the
odds that the relationship might have occurred just by chance? We'll take
up the second question first.
Doing empirical research involves
testing hypotheses suggesting that the value of one variable is
related to that of another variable. If we are
working with sample data, we may find that there is a relationship between two
variables in our sample, and we wish to know how confident we can be that the
relationship is not simply due to chance (or what we call “random sampling
error”) but instead reflects a relationship in the population from which the
sample was drawn. How do we go about doing this?
If we flip a balanced quarter,
the probability of it coming down heads is .5 (or p = .5). The
probability of it coming down heads twice in a row is .5 squared (p = .52 = .25). For ten heads in a row, p = .510 = .0009765, or less
than one chance in a thousand. If we do get ten
heads in a row, we will probably begin to suspect that our growing familiarity
with the Father of Our Country isn’t just a coincidence, and that there is
something wrong with the coin. Notice what we are doing here. We
aren’t directly testing the idea that the coin is unbalanced. Instead, we start off with the working assumption that it is
balanced. If our tests show that this is very unlikely, we will reject
this assumption.
Similarly, when we wish to test a
hypothesis stating that the value of one variable is related to that of another, we begin with the working assumption that the variables are not related. This working assumption is called the null hypothesis (Ho, pronounced “H sub-naught”). We then employ techniques
(one of which is described below) that tell us the probability that the
relationship in our sample has occurred by chance. If that probability is
sufficiently low, we “reject the null hypothesis” and risk concluding that our
original hypothesis, oddly referred to by statisticians as the alternative
hypothesis (Ha, pronounced "H sub-a"), is
supported by the data. In doing so, we have concluded that the
relationship is statistically significant, and
make a statistical
inference about the population based on the data in the sample.
If, on the other hand, the probability (risk) that the null hypothesis is true
is too great, and we conclude that the relationship is not statistically
significant; we “fail to reject” the null hypothesis. This isn’t the same
thing as saying that the null hypothesis is true. It may simply be that
we don’t have enough data on which to base a reliable conclusion.
The language in the preceding
paragraph probably seems rather convoluted. You’ll get used to it.
How low does the probability of
the null hypothesis being true have to be before we reject it? By
convention, a null hypothesis is not rejected unless
the odds of it being true are less than one in twenty (p < .05). Of
course, we could be even more confident in rejecting the null hypothesis if the
risk were even smaller, and the odds were less than one in a hundred (p <
.01, or “significant at the .01 level”) or one in a thousand (p < .001, or
“significant at the .001 level”).
Tests for statistical
significance assume a simple random sample. While you will
rarely be able to work with pure simple random samples, carefully designed
studies like the American National Election Study or the General Social Survey
come close enough to make the use of such tests reasonable. Ivory,
however, doesn't come from a rat’s mouth. If you have a non-probability
based sample (such as those discussed in the Varieties of Data topic) tests for statistical significance won’t bail you out.
There is some
debate as to whether tests for statistical significance are necessary or even
appropriate when you are working, not with a sample, but with population data
(which would include the data files included in POWERMUTT for the Senate, the
American states, and the countries of the world). After all, any relationship you find
in your data, insofar as it is otherwise valid, necessarily applies to the
population.[1] In the examples used in POWERMUTT,
we'll calculate measures of statistical significance even when using population
data, but you should be aware that doing so is controversial.
Finally, the fact that a
relationship is statistically significant only means that you’ve concluded that
there is some relationship between two variables. It does not
necessarily mean that the relationship is a strong one. To
assess that, you need measures of association, an idea to which we will return
later in this topic.
There are a number of tests for
statistical significance that are used for various
specific purposes (t-tests, z-tests, F-ratios). Here we discuss
chi-square, a widely used measure of the statistical significance of a
relationship between two variables displayed in a crosstabulation.
Chi–square
(χ2) (There are actually several versions of chi-square.
The most common, and the one we'll be using, is Pearson's
chi-square.)
Chi-square should
be employed when one or both variables are nominal. If both
variables are ordinal or higher, other more powerful tests are appropriate.
Chi-square is
used to calculate the probability that a relationship found in a sample
between two variables is due to chance (random sampling error). It does
this by measuring the difference between the actual frequencies in each cell of
a table and the frequencies one would expect to find if there were no relationship between the variables in the population
from which the (random) sample has been drawn. The larger these
differences are, the less likely it is that they occurred by chance.
Sometimes, in addition to finding
out where people stand on an issue, it’s also important to know how important
(or “salient”) they think it is. In the following table, using data from
the 2008 American National Election Study, opinion on the importance of the
issue of controlling illegal immigration is broken down by region of the country
(with data weighted using the "weight" variable). While our
dependent variable (attitude toward the importance of the issue) is ordinal,
our independent variable (region) is only nominal.
The first number in each cell of
the table indicates the count,
also called the observed
frequency because it is the actual number of cases observed in the
sample for that cell. The second number in each
cell is the cell count as a percentage of the total number of cases in the
column. We can see that, in the sample, there are some regional
differences. People in the West are most likely to think that the issue
is very important, while people in the South are most likely to say that it is
not important at all. The question is, what are the odds that we would
find differences as large as these just by chance, that is, if no regional
differences existed in the general population from which the sample was taken
(in this case, American adults 18 and older)?
The next table is
presented to illustrate the process of calculating chi-square, and would
not normally be included in an actual research report. It is the same as
the first except that in each cell we have added the expected count, or expected
frequency, which represents the number of cases we would expect to find in the cell if there were no regional differences. We can see
from the row totals of either this or the preceding table that, for the entire
sample, 57.4% of all respondents thought that the issue was very important,
35.8% thought it somewhat important, and 6.8% thought it not important at
all. If we apply these percentages within each region, we will (except for rounding error) produce the expected
frequencies that make up the second numbers in each cell of the new
table. (In row 1, column 3, for example, 57.4% of 844 equals about
484.5. In other words, if the South were just like the rest of the
country, we would expect about 484.5 southern respondents to think that the
issue of controlling illegal immigration is very important. In fact, only
456 did so.)
In general, if we compare
observed and expected frequencies we will notice that in some cells there are
more cases than the null hypothesis would have led us to expect, while in other
cells there are fewer. Chi-square provides a summary measure of these
differences. In the calculation of chi-square, there are several steps
involved.[2] Differences between observed and
expected frequencies (called “residuals”) must be squared (otherwise, they
would always add up to zero), then “standardized” to take into account the fact
that some cells have larger expected frequencies than others.
For a more detailed explanation of how chi-square is calculated, visit http://davidmlane.com/hyperstat/chi_square.html.
For this table, chi-square = 17.458.
We next need to adjust for the
fact that some tables have more cells than others.
We do this by calculating the degrees of freedom (d.f.) for the table, which are
equal to the number of rows minus 1 times the number of columns minus 1.
In this case, since the table has three rows and four columns, d.f. = (3 – 1)(4 – 1) = 6.
Once we’ve calculated the value
of chi-square and determined the degrees of freedom, we can look up the
probability that the differences in the sample are due to
chance by referring to a table of “critical values of chi-square” found in the appendices of most statistics texts. Better yet, we can let the
computer figure it out for us. In this case, a chi-square of 17.458 in a
table with 6 degrees of freedom would occur by chance 8 times in a thousand
(which we would write as “p=.008).”. The relationship is indeed
statistically significant.
(In SPSS, “Asymp. Sig. (2-sided) ” is equivalent to
“p.” If “Asymp. Sig.
(2-sided)” is shown as “.000,” this really means “<.0005,” which SPSS rounds
to the nearest thousandth.)
In this topic, we will deal with measures
of association between two variables
calculated in conjunction with a contingency table. Elsewhere, we discuss
measures of association used when comparing
means and in doing regression analysis.
As already noted, statistical
significance means only that we can confidently infer that there is some degree of relationship between our variables in the population from which the
sample was drawn. We would also like to know how strong the
relationship is in the sample. Assessing the strength
of a relationship is where measures of
association come in.
The best way to determine whether
a relationship in a table is strong or weak is to examine the table
itself. If the percentage differences among categories of the independent
variable seem important, they probably are. Just how big the differences
need to be to be considered important will vary with
the research questions you are asking. A difference of 10 percentage
points is not very dramatic, but in a two-candidate political campaign it could spell the difference between a comfortable
55 to 45 percent win and a decidedly uncomfortable defeat by the same margin.
Still, measures of association
between variables are a useful way to summarize the strength of a
relationship. This is especially true if you are running a large number
of crosstabulations and need a convenient way of sorting out the results to
determine which relationships are most important.
In general, measures of
association range in value from 0 (indicating no
relationship), to ±1 (indicating a perfect relationship). Some measures
appropriate for use with nominal data range from 0 to
a number approaching, but never reaching, 1. Measures appropriate with
nominal data are always positive (since direction has no meaning with nominal
data), while those appropriate for use with ordinal data or higher may be either positive or negative. The strength of the
relationship is indicated by the absolute value of the
measure, not its sign. An association of -.7 is much stronger than one of
+.2. Moreover, the sign of the relationship may simply be an artifact of
the way we have coded the data. For example, if ideology is coded on a
scale from 1 to 5, it is entirely arbitrary whether
the higher numbers are associated with liberalism or with conservatism.
Most measures of association are
“symmetric.” This means that the value of the measure is the same
whichever variable is considered to be the dependent
variable. Some measures are “asymmetric.” The measure will have one
value if the row variable is the dependent variable, and another if the column
variable is dependent.
Some, but not all, measures of
association have a Proportional Reduction in
Error (PRE) interpretation. Basically,
such measures indicate how much knowing the value of the independent
variable improves our ability to guess the value of the dependent
variable. More formally, PRE measures employ the following general
formula:
where E1represents the
errors we will make guessing the value of the dependent variable if we do not
know the value of the independent variable, and E2 represents the
errors we will make guessing the value of the dependent variable if we do know
the value of the independent variable. If two variables are completely
unrelated, then E2 will be no less than E1 and the PRE
will be 0. If two variables are perfectly
related, then E2 will be 0, and the PRE
will be 1. A PRE of, for instance, .25 would indicate a 25 percent
reduction in error.
A non-PRE measure has no such
interpretation. All we can say about its value is that the further away
from zero (in either direction) it is, the stronger the relationship.
Some texts suggest various rules
of thumb for thresholds between weak, moderate, and strong relationships.
All such rules are fairly arbitrary. In general,
you can expect that measures of association will tend to be higher when higher
levels of measurement are used. In addition, individual data (such as
from an opinion survey) will tend to produce measures of association with more
modest values than those obtained from aggregate data (such as data from a
census, or from election returns by county). This is because individual
data will often contain a lot of “noise” variance (for example, even though
party identification is generally a good predictor of how people vote, you may
vote for a member of another party because the two of you went to the same high
school) that tends to be filtered out when data are aggregated.
The most important factor in
deciding which of the many available measures of association to use is the
level of measurement of your variables. In the remainder of this topic,
we will briefly describe some of the various measures of association produced
by the SPSS Crosstabs procedure for nominal and ordinal data. In order to
use any of the ordinal measures, both variables must be at least ordinal
level. If either or both are nominal, you will have to use a nominal
measure.
One measure of association that can be used when one or both variables are nominal level is lambda (λ). This measure
has, as we will see, some severe limitations, but is easy to compute, and is
useful for illustrating how PRE measures work.
In the American National Election
Study’s 2008 postelection survey, respondents were asked how they had voted in the contests for seats in the U.S. House of
Representatives. The next table compares the responses of Democrats,
Independents, and Republicans in order to test the hypothesis that the way
people vote is influenced by their party
identification. Data are again weighted by the
"weight" variable. Votes for minor party candidates have been excluded.)
While party identification might be
considered an ordinal measure, voting choice is nominal, and so a
nominal measure of association is required. If you knew nothing about a
person other than that he or she was one of the respondents included in the
table, your best guess as to how the respondent voted would
be made by picking the response with the highest frequency (called the mode). In other words, you
would guess that the respondent voted for a Democratic candidate. You
would be correct 696 times, but would be in error 600 times. Let us call
this number of errors E1.
If you know the
person’s party identification, you will still guess that the respondent voted
for a Democratic candidate if he or she was a Democrat or an independent, but
for Republicans, your best guess would be that the respondent voted for a
Republican. In other words, you would guess the modal value of the
dependent variable within each category of the independent variable. You
would now make 302 (55+188+59) errors. Let us call this number of errors
E2.
Entering E1and E2 into the general PRE formula, we obtain:
λ = (600-302)/600 = .497
(Lambda is an asymmetric measure.
Since SPSS does not know which variable we wish to treat as dependent, it
calculates the measure both ways. Be sure to use the measure
appropriate to your hypothesis. In this instance, we pick the middle number
(.497), since "housevote" is the dependent
variable. Ignore the "symmetric" lambda. Also ignore Goodman and Kruskal's tau, which we are not
covering.)
Often lambda severely understates
the strength of a relationship. Consider the relationship discussed
earlier between attitude toward the importance of controlling illegal
immigration and region. Since the most common response
in each region was that the issue is "very important," we know,
without even having to calculate it, that the value of lambda for the
relationship is 0. Consider also the relationship
between attitude toward capital punishment and gender. Again
using the 2008 American National Election Study Subset (and the same weight
variable), in which the attitudes toward capital punishment of men and women
are compared.
There is a substantial difference
between the opinions of men and women on this issue. While most women,
like most men, favored capital punishment, women were less likely than men to
do so. However, since the modal (i.e., most common) choice of both of
both men and women was to strongly favor the death penalty, knowing a
respondent’s gender would not help you guess his or her position – in either
case "favor strongly" would be the best guess. The value of
lambda for this table is therefore 0. Lambda has, in other words, failed to capture the substantial
difference shown in the table.
The Goodman and Kruskal tau (τ) is generally similar to lambda, and
suffers from the same tendency to understate the strength of relationships,
though not to the same degree.
Another approach to measuring the
strength of a relationship with nominal data is to standardize chi-square so
that, regardless of the sample size, it ranges from 0 to a number approaching 1. Cramer’s
V, called phi (φ) in the case of 2 X 2 tables, and the
contingency coefficient (available in SPSS but not covered here) are both chi-square
based measures. (A disadvantage of the contingency coefficient is that
its maximum possible depends on the number of cells in
the table, which makes it difficult to compare results for tables of different
size.) Remember that these chi-square based measures do not have a
PRE interpretation. All you can say in interpreting your results is that
the higher the value, the stronger the relationship. In this case,
Cramer's V is .131. This is a bit stronger than, say, .10 but a little
weaker than .15.
A measure of association that can be used when both variables are ordinal level is gamma (γ). The basic notion
behind gamma is that, in a contingency table, if one case has a higher value
than another on one variable, it will have a higher
value on the other if there is a positive relationship between the variables,
and will have a lower value on the other if there is a negative relationship.
When one case has a higher value
than another case on both variables, the cases are said to form a concordant
pair. (Concordant literally means “singing together.”) When one case has a higher value than another on one variable, but a lower value on another, the cases are
said to form a discordant
pair. The formula for gamma is:
where C is the total number of concordant
pairs, and D is the total number of discordant pairs.
If all pairs are concordant, then
gamma will equal 1; if they are all discordant, it
will equal -1; if there are equal numbers of concordant and discordant pairs,
it will equal 0.
Consider the relationship between
opinion on the war in Iraq and party identification. You hypothesize that
Democrats will be more likely than Republicans to think that the war was not
worth the cost, with independents somewhere in between. In
other words, we are treating party identification as an ordinal variable. We are also treating attitude toward the war as an ordinal
variable, since there are presumably different degrees of support and
opposition even though our measure is dichotomous (with respondents forced to
choose between "worth it" and "not worth it"). The following table shows the relationship between these
two variables for respondents to the 2008 American National Election
Study. (Data are, once again, weighted by
"weight.")
Each of the 57 cases in row 1, column
1 forms a concordant pair with each case in the four cells below and to the
right of that cell. Similarly, each of the 151 cases in row 1, column 2
forms a concordant pair with each case in the four cells below and to the right
of it. The total number of concordant cells is
calculated by taking each cell, multiplying the number of cases in the
cell with the total number of cases, if any, in cells below and to the right of
the cell, and then summing the results. Discordant pairs are calculated in a similar manner, except that each cell is
paired with cells below and to its left. Completing this
admittedly tedious process (were it done manually)
produces the following results:
Concordant
pairs:
57 X (646+234) = 50,160
151 X (234) = 35,334
85,494
Discordant
pairs:
151
X (639) = 96,489
294 X (639+646) = 377,790
474,279
γ = (85,494 - 474,279)/(85,494 + 474,279) = (-388,794)/(559,773) = -.695
The absolute value (that is,
ignoring the sign[3] of the
coefficient has a PRE interpretation. It tells us, as a proportion of the
total number of pairs, how many more correct than incorrect guesses we make in
guessing whether a pair of cases is concordant or discordant if we know knowing
which case has the "higher" value for the independent variable.
Our hypothesis leads us to guess that if person A is in a column to the left of
person B when it comes to party identification, he or she will also be more
likely to think that the war has not been worth the cost. In other words,
we guess that pairs will be discordant. There are, in fact, 388,794 more
discordant than concordant pairs, which is about 69.5 percent of the
total. (Of course, this assumes that you started out with the correct
hypothesis. If you had hypothesized that Republicans were more opposed to
the war than Democrats, you would have predicted that
pairs would be concordant, and would have made 388,794 more incorrect than
correct guesses.)
Just as lambda has some shortcomings,
so too does gamma. Notice that it ignores ties. The 57 cases in row
1, column 1, for example, are tied with each other on both variables, with the
other cases in column 1 on party identification, and with other cases in row 1
on attitude toward the war. Gamma simply ignores these. Other
measures have been devised which correct for ties. One
is Kendall’s tau (τ). There are two versions of this measure. Taub (pronounced “tau sub b”) is used when there are an
equal number of rows and columns in the table. Tauc (“tau sub c”) is used when the numbers of rows and columns are not the
same. A similar measure, though probably less widely used than Kendall's
tau, is called Somers' D.
In picking the best measure of
association to use, first ask yourself what level of measurement you’re dealing
with. If one or both variables are nominal, use Cramer’s V unless your
instructor tells you otherwise. If both are ordinal (again, unless
directed differently by your instructor) use Kendall’s taub if the table contains equal numbers of rows and columns, tauc otherwise.
(In SPSS, if you ask for Cramer’s
V, you automatically get the “Approx. Sig.” Notice that, if you ask for
both Cramer’s V and Chi-square, the “Approx. Sig.” for Cramer’s V appears to be
identical to the “Asymp. Sig.” for
the Pearson Chi-square. They are identical, so you really
don’t need to ask for chi-square.)
If you ask for Kendall’s tau, you
automatically get another measure of statistical significance, called the t
test. The "Approx. Sig." for this
test is the significance of t, that is, the probability (p) that the
relationship could have occurred by chance.
alternative
hypothesis
chi-square
concordant pair
count
Cramer's V
degrees of freedom
discordant pair
expected frequency
gamma
Kendall's tau
lambda
measures of association
null hypothesis
observed frequency
Pearson's chi-square
Proportional Reduction in Error
statistical inference
statistically significant
taub
tauc
Start SPSS. Repeat the
crosstabulation exercises in the More About Measurements topic but,
in addition, ask for appropriate measures of association. SPSS will
automatically calculate the statistical significance of the association.
Note: In some cases, you
may need to combine categories of one or more variables before running
crosstabulations. See recode and compute for more
information.
Which relationships are
statistically significant at at least the .05
level? Which are at least relatively strong?
Becker, Lee A., “CROSSTABS:
Measures for Nominal Data,” http://www.uccs.edu/~faculty/lbecker/spss80/ctabs1.htm.
Becker, Lee A., “CROSSTABS:
Measures for Ordinal Data,”
Creative Research Systems,”
Significance in Statistics and Surveys,” The Survey System: Customize Your
Surveys with Our Packages.
Lane, David M., “Chapter 16:
Chi-Square,” HyperStat Textbook Online. http://davidmlane.com/hyperstat/chi_square.html.
[1] James Neill, "Why Use Effect Sizes Instead of Significance Testing in
Program Evaluation?" http://wilderdom.com/research/effectsizes.html. Last updated: September 11, 2008. Accessed February 25, 2013. For a different
perspective on this issue, see W. Phillips Shively, The Craft of Political
Research (6th edition). Upper
Saddle River, NJ: Prentice Hall, 2004: 160-161.
[2] The
formula for computing chi-square is:
Σ[(fo – fe)2/fe] where:
fo = the observed frequency in each cell,
and
fe = the expected frequency in each cell.
Last updated
April 28, 2013 .
© 2003---2013 John L. Korey. Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.