Measures of association in crosstab tables

Copyright © Richard B. Darlington. All rights reserved.

How can you measure the association between two categorical variables like religion, occupation, or preference among several choices? In an extensive literature review, Goodman and Kruskal (1979) found several dozen measures for this purpose. As they point out, very few of these measures have a ratio-scale interpretation that would appeal to a typical user. That is, for most of these measures there is no simple and useful sense in which an association of, say, .6 is twice as high as an association of .3. But in this document we concentrate almost entirely on measures with such interpretations; of the 19 measures of association in the outline below, only phi lacks such an interpretation. Some of the measures discussed here range from -1 to +1, some range from 0 to 1, and some typically range from 0 to 1 but can be negative in exceptional circumstances.

This document mentions sampling variability and significance tests only in passing, focusing instead on the meaning of measures in a descriptive sense. For tables larger than 2 x 2, the familiar Pearson chi-square test is the standard test of the null hypothesis of no association.

Throughout we let r and c denote a table's number of rows and columns respectively. As is commonly done with chi-square tests for association, we let o denote the observed frequency in an individual cell, and let e denote the expected cell frequency as calculated by the familiar formula

e = row total x column total / N

N is the total sample size and summations are typically across cells, so N = SUM(o) = SUM(e), where SUM denotes summation.

In the outline below, clicking on any of the highlighted selection criteria leads to explanations of those criteria. The reader who wants an overview may wish to study all the selection criteria first; clicking here sends you to those explanations. You may then want to study seven other properties of various measures; click here for that.

The document is organized as 7 files or Web "pages". This is the main page, but there is a separate introductory page for readers who want a more basic introduction to the whole topic. Each of the other 5 pages concerns a particular type of measure of association:

When taken in this order, the measures go broadly from more general measures applicable to any table, to more specialized measures applicable only to tables meeting certain conditions. Thus a reader who wants to read the document "from beginning to end" might well read the sections in this order.

The first measure of association discussed in this order is the Goodman-Kruskal lambda. Many other measures are introduced by comparing them to lambda. Thus readers who want to read sections in an order of their own choosing, but who are unfamiliar with lambda, may want to read first the introductory page, where lambda is discussed in detail.

An Outline for Choosing among 19 Measures of Association

Blank lines divide this outline into 5 sections. Measures for special circumstances generally come first in the outline, and more general measures come last. Thus the outline's 5 sections orrespond, in reverse order, to the 5 sections listed above. However, the outline spans the 5 sections; a heading in one section may fall under a higher-level heading in the previous section. For instance, the 6 headings listed below are in several different sections of the outline, but lambda, in the final section of the table, falls under all of them. Note that the selection criteria in color are not necessarily more important than those not in color. The criteria in color are simply those that need more explanation, so hypertext links are provided to their explanations.
Categories on both variables are naturally ranked from low to high, and a measure of monotonic association is desired...Measures of this type range from -1 to +1.
.....Treat variables symmetrically
..........Best-known measures
...............Tables of any size .....Gamma
...............2 x 2 tables .....Yule Q
..........Recommended measure .....SAG
.....Treat variables asymmetrically
..........Tables of any size .....Somers D
..........2 x k tables .....RBC
Descriptions of the measures in this section.

Categories on one or both variables not naturally ranked,
or a measure of nonmonotonic association is desired

.....Columns have same categories as rows, and that's relevant
.....Measures of this type typically range from 0 to 1, though negative values are possible.
..........Treat variables symmetrically .....Kappa
..........Treat variables asymmetrically
...............Margin-bound measure ..... Lambda-d
...............Margin-free measure ..... Lambda-max-d
Descriptions of the measures in this section.

.....Columns don't have same categories as rows, or the sameness is irrelevant
.....Measures of this type range from 0 to 1.
..........Treat variables symmetrically
...............Recommended measures
....................Don't correct for differences in marginal totals ..... OGE
....................Correct for differences in marginal totals (2 x 2 only)..... CA
...............Best-known measures .....Cramer phi and phi2
Descriptions of the measures in this section.

..........Treat variables asymmetrically
...............Variables are independent and dependent variables
....................Table is 2 x 2, one column is "control" ..... RE
....................More general measures ..... Lambda-max, Gini D
Descriptions of the measures in this section

...............Variables are predictor and criterion
....................Predictor predicts just one column for each case
.........................All guesses weighted equally ..... Lambda
.........................Guesses weighted unequally ..... Lambda-max, Gini D
....................Prediction is distributed across all columns
.........................Predictor merely ranks columns by confidence ..... MR
.........................Predictor assigns an exact probability to each column
..............................Simple measure with a trivial technical problem ..... MP
..............................More complex measure that's "clean" technically.....Uncertainty
Descriptions of all the measures in this section except Lambda-max and Gini D

Explanations of the selection criteria used in the outline

Ranked categories.. The categories of a variable may have a natural ranking, as violations, misdemeanors, and felonies are ranked on seriousness. Variables like occupation and religion have no such natural ranking. Even if the ranking exists for both variables, an analyst might occasionally prefer a measure of association that ignores the ranking, since the ranking is relevant primarily if one wants a measure of monotonic association--that is, a measure of the degree to which the two variables increase together.

Symmetry.. One might wish to treat the row and column variables asymmetrically if one is the independent variable and the other the dependent variable, or if one (the criterion) is to be predicted from the other (the predictor). If a measure treats the two variables asymmetrically, the value of the measure will typically change if the table is "flipped" so that each row becomes a column and each column becomes a row. If a measure treats rows and columns symmetrically, flipping the table will leave the measure unchanged. Such a measure might be reasonable if two variables, such as occupation and religion, are presumed be affected by a set of other variables that may be unmeasured, but the two measured variables cannot be clearly classified as independent and dependent variables.

Same categories.. The column variable sometimes has the same categories as the row variable, as when husband's religion is tabulated against wife's religion, or the row and column variables are the ratings of two judges who used the same categories. Occasionally this sameness may exist but you will choose to ignore it. For instance, imagine a party guest trying to guess the occupations of the other guests. If the guesser labeled every schoolteacher a general office worker, every general office worker a lawyer, and every lawyer a schoolteacher, then it may be of interest to note that the guesser did distinguish perfectly among the three groups even though all the guesses were wrong. To do that, use a measure of association that ignores the fact that the categories of "guess" are the same as the categories of "actual profession".

Margin-free and margin-bound measures.. We will call a measure of association "margin-free" if it is unchanged when all cell frequencies in a given column are multiplied by an arbitrary constant. By such measures, the two 2 x 2 tables below would have the same association, since column 1 in the second table is simply twice column 1 in the first table, and the second columns are identical.



Measures with this property may be useful as measures of causation, where the sizes of treatment and control groups may be arbitrary and you want a measure that is independent of those sizes. For instance, if you report, "The treatment doubled the success rate, from 20% in the control group to 40% in the treatment group," you are reporting a measure of causal efficacy that is independent of group sizes.

A margin-free measure is also useful for measuring predictive power, if the relative sizes of the two criterion groups change from situation to situation and you want a measure that is independent of those sizes. For instance, if a sign in a lie-detection system were observed in 80% of liars and in only 20% of truth-tellers, the 60% difference between these rates is a useful measure of predictive accuracy that is unaffected by the fact that some populations may contain a higher proportion of liars than others.

But if the relative sizes of the criterion groups are known, then you may want a measure of predictive accuracy that is specific to those group sizes. Margin-bound measures have that property.

Margin totals..Consider the following table showing the numbers of people who got each of two test items correct.


On the one hand, the two items are highly linked in the sense that nobody ever got item 2 correct after missing item 1. On the other hand, most of the sample got item 1 but missed item 2. Thus the two items have low association in one sense, but that lowness seems to be caused largely by the difference in the marginal totals: 90 people got item 1 correct while only 10 got item 2 correct. These two items have perfect association by measure CA, which corrects for differences in marginal totals, but have only low association by measure OGE, which does not. You might sometimes want to report both OGE and CA, since they measure different properties of the association between two variables.

Single vs. distributed guesses.. A predictor trying to predict category membership might do it under various conditions. The predictor might simply have to choose one category, or might get to rank the various categories by the likelihood of being the correct one, or might get to assign an exact probability to each category. For instance, suppose 100 people fall into categories A, B, C, D with frequencies 10, 20, 30, 40 respectively. If someone were picked at random and you simply had to guess which category he came from with no individual information about him, your best guess would be D since that is the largest category. But if you had to rank the 4 categories from low to high according to your confidence the person was in each one, you would rank them in the order A, B, C, D. And if you instead had to assign a probability to each category, reasonable probabilities would be .1, .2, .3, .4 respectively, since those are the proportions of the four categories. Lambda, lambda-max and the Gini D all apply when the guesser picks one category, MR applies when he or she ranks the categories, and MP and Uncertainty applies when he or she picks actual probabilities.

Weighted and unweighted guesses.. In assessing the accuracy of a prediction, you might want to use a measure that is a function simply of the number of correct guesses, or you might want to weight correct "long shots" more heavily than other guesses. For instance, suppose someone is trying to guess the occupations of other people. You would probably be more impressed at the guesser's accuracy if he or she were correct on a rare profession like actuary than on a more common profession like lawyer. Lambda weights all guesses equally while Lambda-max and Gini D more weight to correct long shots.

Seven properties of various measures of association

Except for phi, all measures in the outline have some sort of ratio-scale utility interpretation. With all such measures except those requiring ordered categories, you can imagine a game in which you win a certain amount for certain kinds of cases and lose a certain amount for other kinds of cases, and the measure of association equals the total or mean winnings. For instance, suppose you must use a case's row membership to guess its column membership, and you win $1 for each case guessed correctly. Lambda is closely related to your total winnings in this imaginary game.

All measures requiring ordered categories also have utility interpretations, but they are based on pairs of cases rather than single cases. They are discussed next.

Concordance interpretations apply only when the row and column variables both have a natural order, and apply to all such measures in the outline. For such measures we can identify any pair of cases as concordant, discordant, or neither. A pair is concondant if the case higher on the row variable is also higher on the column variable, while a pair is discordant if the case higher on the row variable is lower on the column variable. If a pair is tied on either the row or column variable, it is neither concordant or discordant. Then all the ordered-variable measures in the outline are in some way proportional to (Con - Dis), where Con is the number of concordant pairs and Dis is the number of discordant pairs. This gives these measures a ratio-scale interpretation in terms of the proportions of pairs that are concordant and discordant. All these measures have utility interpretations because one can imagine a game in which you win $1 for each concordant pair and lose $1 for each discordant pair. Each measure of association can be described as your average winnings in such a game; as explained in the discussions of those measures, they differ in the pairs they count.

Frequency interpretations are utility interpretations that apply to measures using unordered categories. Such measures identify a target set of cells, and define observed, null, and max frequencies for that set, where the null frequency is the frequency expected in the target set under no association, and the max frequency is the largest possible frequency in the target set. Then the measure of association is defined as

association = (observed freq - null freq)/(max freq - null freq)

where "freq" denotes frequency and all frequencies are for the target set. Thus the measure of association equals the amount that the observed frequency in the target set exceeds the null frequency for that set, expressed as a proportion of the largest amount by which the observed frequency could possibly exceed the null.

For instance, the Cohen kappa measures the degree of row-column agreement when row and column variables have the same categories, as when two judges each classify each case into one of several categories and you want to measure the judges' agreement between cases. The target set for kappa includes the cells in the upper-left-to-lower-right diagonal, since those are the cells indicating agreement. The null frequency for each cell is calculated by the same formula for expected frequencies used in computing chi-square values, and the null frequency for the target set is the sum of these values for the diagonal cells. The max frequency for the target set is simply N. Thus kappa equals the amount that the observed frequency in the diagonal exceeds the null frequency for those cells, expressed as a proportion of the largest amount by which the observed frequency could possibly exceed the null. Frequency interpretations apply to lambda, OGE, and CA, lambda-d, and kappa.

Weighted frequency interpretations use the same basic formula used by frequency interpretations--but the observed, null, and max frequencies in the formula are weighted rather than unweighted sums of cell frequencies. To see why this may be reasonable, suppose someone is trying to guess the occupations of other people. You would probably be more impressed at the guesser's accuracy if he or she were correct on a rare profession like actuary than on a more common profession like lawyer. Several measures of association (MR, Lambda-max, Gini D, and Lambda-max-d) take this into account by weighting each case in inverse proportion to its column total. Thus all columns count equally despite differing column frequencies, so cases in sparse columns get more weight. Except for this, these measures are like measures with simple frequency interpretations.

Difference proportionality is one particular type of frequency interpretation. Let us say that a 2 x 2 frequency table has double diagonal symmetry (DDS) if its upper left entry equals its lower right entry, and its lower left equals its upper right. The three 2 x 2 tables below all have DDS.

738 291
372 819

DDS does not appear in bold because we use it only to introduce difference proportionality. A measure of association will be said to have difference proportionality if, when applied to a 2 x 2 table with DDS, it equals the difference between the two within-row or within-column proportions. Thus a measure with difference proportionality would equal respectively .4, .6, and .8 in the three tables just shown. A measure will be said to have square proportionality if, in such tables, it equals the square of these differences. Thus a measure with square proportionality would equal respectively .16, .36, and .64 in these three tables. Many of the measures of association in this document have either difference or square proportionality.

Measures with unique zero are zero only if the row and column frequencies are completely independent, as in the table


In this table, the second and third columns are exactly twice and three times the first, producing complete independence. Unique zero seems to us to be an essential property for measures of simple causation, since any pattern other than complete independence implies some sort of causation. (Recall that here we are temporarily ignoring sampling error and seeking measures that would at least have the desired properties in infinitely large populations.)

Some measures of prediction are useful without unique zero. For instance, consider the table


Suppose you had to guess a case's column from its row, and you wanted to maximize the number of correct guesses. Then your best guess is column 1, whether the case is in row 1 or row 2. Thus the row variable is of no use in predicting column membership, despite the large difference between the two rows. Some measures of association, such as the Goodman-Kruskal lambda, are therefore zero for this table. Such measures lack unique zero.

Some equivalences among measures

r = number of rows, c = number of columns

Lambda-max = lambda when column totals are equal.

If table is square and the largest cell in each column is the diagonal cell, then Lambda-d = lambda and lambda-max-d = lambda-max

Lambda-max reduces to the Gini D when c = 2.

Gamma reduces to the Yule Q when r = c = 2.

MP = phi2 when c = 2.

Somers D reduces to the rank-biserial r when c = 2.

Lambda-max = |Somers D| in 2 x 2 tables.

Go to next section


Goodman, Leo A. and Kruskal, William H. (1979) Measures of association for cross classifications. New York, Springer-Verlag.

Theil, Henri (1972). Statistical decomposition analysis. Amsterdam, North Holland.