Introduction to Research Methods in Political Science: |
IV. DISPLAYING CATEGORICAL DATA
Subtopics |
SPSS Tools
|
A picture is said to be worth a thousand words. Tables and graphs, properly designed, can provide clear pictures of patterns contained in many thousands of pieces of information. In this topic, we will describe several ways of displaying information about categorical variables in tabular and graphic form. In later topics, ways of displaying information about continuous variables will be explained.
A frequency table (or frequency distribution) displays numbers and percentages for each value of a variable. It is useful for categorical variables (that is, those with values falling into a relatively small number of discrete categories, such as party identification, religious affiliation, or region of a country) rather than for continuous variables (such as age in years or gross domestic product in dollars).
The following frequency distribution shows, by region of the country, how many state legislatures are controlled by each major party. (In four states, each party controls one of the state's two houses, while in one state, Nebraska, the legislature is officially non-partisan.
The first column in the table provides a label for each category of the variable. The second and third columns show, respectively, the number and percent of cases in each category for all cases. The fourth column shows the percent in each category after eliminating cases for which we do not have information (missing data). Since we know the party composition of every state legislature, the fourth column is identical to the third in this case. The last column shows the cumulative percentages as one goes from the first to the last category. Note that this last column makes sense only if the values of the variable can be meaningfully ranked. In other words, cumulative frequencies assume at least ordinal level measurement. The numbers in this column make no sense in this example, since it wouldn't be meaningful to say that "98 percent of state legislatures have split majorities or less."
A contingency table (also called a crosstabulation, or crosstab for short) displays the relationship between one categorical variable and another. It is called a “contingency table” because it allows us to examine a hypothesis that the values of one variable are contingent (dependent) upon those of another.
The following crosstabulation shows the relationship between control of state legislatures and region of the country at the start of the 113th Congress (2013-2014):
Do not let all the trees get in the way of seeing the forest. In interpreting a crosstab, it is crucial to focus on the overall picture. In this case, the table shows that there are substantial regional differences in party strength. Don’t get bogged down in the details.
Table 1: |
||
# of States |
Percent |
|
Party Control |
||
Democrat |
19 |
38 |
Republican |
26 |
52 |
Split |
4 |
4 |
Non-partisan |
1 |
2 |
Totals |
50 |
100 |
Source: National Conference of State Legislatures, "2012 Live Election Night Coverage of State Legislative Races," http://www.ncsl.org. Accessed November 10, 2012. |
Table 2: |
||||
Northeast |
Midwest |
South |
West |
|
Party Control |
||||
Democrat |
77.8% |
16.7% |
18.8% |
53.8% |
Republican |
11.1 |
66.7 |
68.8 |
46.2 |
Split |
11.1 |
8.3 |
12.5 |
0.0 |
Non-partisan |
0.0 |
8.3 |
0.0 |
0.0 |
Totals |
100.0 |
100.0 |
100.0 |
100.0 |
N |
9 |
12 |
16 |
13 |
Source: National Conference of State Legislatures, "2012 Live Election Night Coverage of State Legislative Races," http://www.ncsl.org. Accessed November 10, 2012. |
A pie chart is a simple way to show the
distribution of a variable that has a relatively small number of values, or
categories. Figure 1 provides in graphic form information similar to what table 1 (above) presents in tabular form:
Similarly figure 2 is analogous to table 2, and breaks the results down by region:
Finally, figure 4 shows a "clustered" bar chart, with results displayed for each region. It is similar to table 2 and figure 2.
bar chart
contingency table
crosstab
crosstabulation
frequency
distribution
frequency table
pie chart
These exercises use the 2008 American National Election Study Subset. Open the codebook describing these data. Start SPSS and open the anes08s.sav file.
1. Prepare a frequency table, pie chart, and bar chart for an economic, social, or foreign policy issue of your choosing (see codebook). Crosstabulate this with several background variables (again, see codebook) that you think might influence a respondent's opinion on this issue. Cautions: 1) avoid background variables like age or income that have a large number of categories; 2) some categories of some background variables contain very few cases, and the results are likely to be unreliable. In another topic, we'll discuss how measures of statistical significance can help you better assess the reliability of findings. In yet another topic, we'll also show you how to modify variables to make them more manageable.
Convert your frequency and contingency tables into presentation-ready form.
2. In exercise 1 of the “Political Science as a Social Science” topic, you were asked to come up with hypotheses that might help explain party identification. Using "partyid3" as the dependent variable, construct contingency tables to test the following hypotheses, along with any others you can think of:
Energy Information Administration, “Energy Explained: Your Guide to Understanding Energy,” Official Energy Statistics From the Government. I http://www.eia.doe.gov/pub/oil_gas/petroleum/analysis_publications/oil_market_basics/graphs_and_charts.htm.
Gostats.com, "Graphing and Types of Graphs," GoStats. http://gostats.com/resources/types-of-graphs.html.
Math League Multimedia, “Using Data and Statistics,” The Math League. http://www.mathleague.com/index.php?option=com_content&view=article&id=69.
Social Science Research and Instructional Council, "Links to Other Instructional Sites: Graphs," http://www.ssric.org/tr/links#graphs.
[1] On occasion, crosstabs are used, not to test hypotheses, but for descriptive purposes, and this general rule does not apply. For example, the 2008 American National Election Study Subset (see codebook) includes several variables thought to measure political efficacy (a person's belief that he or she can have an impact on politics). If these variables are valid measures of the same underlying concept, there should be a relationship between answers to one question and those to another, but we are not hypothesizing that either one depends on the other. In trying to see whether this is the case, we may decide to look, for each cell in the table, at the percent of the total table rather than of either the row or column.
[2] A more systematic method for assessing the reliability of percentages in a crosstab is discussed under the topic of contingency table analysis.
Last updated
April 28, 2013 .
© 2003---2013 John L. Korey. Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.