How can you measure the association between two categorical variables like religion, occupation, or preference among several choices? In an extensive literature review, Goodman and Kruskal (1979) found several dozen measures for this purpose. As they point out, very few of these measures have a ratio-scale interpretation that would appeal to a typical user. That is, for most of these measures there is no simple and useful sense in which an association of, say, .6 is twice as high as an association of .3. But in this document we concentrate almost entirely on measures with such interpretations; of the 19 measures of association in the outline below, only phi lacks such an interpretation. Some of the measures discussed here range from -1 to +1, some range from 0 to 1, and some typically range from 0 to 1 but can be negative in exceptional circumstances.

This document mentions sampling variability and significance tests only in passing, focusing instead on the meaning of measures in a descriptive sense. For tables larger than 2 x 2, the familiar Pearson chi-square test is the standard test of the null hypothesis of no association.

Throughout we let *r* and *c* denote a table's number of rows and
columns
respectively. As is commonly done with chi-square tests for association, we
let *o* denote the observed frequency in an individual cell, and let *e*
denote the
expected cell frequency as calculated by the familiar formula

*e* = row total x column total / *N*

*N* is the total sample size and summations are typically across cells,
so *N* = SUM(*o*) = SUM(*e*), where SUM denotes
summation.

In the outline below, clicking on any of the highlighted selection criteria leads to explanations of those criteria. The reader who wants an overview may wish to study all the selection criteria first; clicking here sends you to those explanations. You may then want to study seven other properties of various measures; click here for that.

The document is organized as 7 files or Web "pages". This is the main page, but there is a separate introductory page for readers who want a more basic introduction to the whole topic. Each of the other 5 pages concerns a particular type of measure of association:

- Asymmetric margin-bound measures applicable to any table
- Asymmetric margin-free measures applicable to any table
- Symmetric measures applicable to any table
- Measures requiring that rows and columns have the same categories
- Measures for ordered categories

The first measure of association discussed in this order is the Goodman-Kruskal lambda. Many other measures are introduced by comparing them to lambda. Thus readers who want to read sections in an order of their own choosing, but who are unfamiliar with lambda, may want to read first the introductory page, where lambda is discussed in detail.

- Categories on one or both variables not naturally ranked...
- Columns don't have same categories as rows...
- Treat variables asymmetrically
- Variables are predictor and criterion
- Predictor predicts just one column for each case
- All guesses weighted equally

Categories on both variables are naturally ranked from low to high, and a measure of monotonic association is desired...Measures of this type range from -1 to +1.

.....Treat variables symmetrically

..........Best-known measures

...............Tables of any size .....

...............2 x 2 tables .....

..........Recommended measure .....

.....Treat variables asymmetrically

..........Tables of any size .....

..........2 x k tables .....

Descriptions of the measures in this section.

Categories on one or both variables not naturally ranked,

*or* a measure of nonmonotonic association is desired

.....Columns have same categories as rows, and that's
relevant

.....Measures of this type typically range from 0 to 1, though negative values are
possible.

..........Treat variables symmetrically
.....**Kappa**

..........Treat variables asymmetrically

...............Margin-bound measure .....
**Lambda-d**

...............Margin-free measure .....
**Lambda-max-d**

Descriptions of the measures in this
section.

.....Columns don't have same categories as rows, or the sameness
is irrelevant

.....Measures of this type range from 0 to 1.

..........Treat variables symmetrically

...............Recommended measures

....................Don't correct for differences in marginal
totals ..... **OGE**

....................Correct for differences in marginal totals (2 x
2 only)..... **CA**

...............Best-known measures .....**Cramer phi and
phi ^{2}**

Descriptions of the measures in this section.

..........Treat variables asymmetrically

...............Variables are independent and dependent variables

....................Table is 2 x 2, one column is "control" ..... **RE**

....................More general measures ..... **Lambda-max, Gini D**

Descriptions of the measures in this
section

...............Variables are predictor and criterion

....................Predictor predicts just one column for each
case

.........................All guesses weighted equally .....
**Lambda**

.........................Guesses weighted unequally .....
**Lambda-max, Gini D**

....................Prediction is distributed across all
columns

.........................Predictor merely ranks columns by
confidence ..... **MR**

.........................Predictor assigns an exact probability to
each column

..............................Simple measure with a trivial technical problem .....
**MP**

..............................More complex measure that's "clean"
technically.....**Uncertainty**

Descriptions of all the measures in this section
except Lambda-max and Gini D

**Symmetry**..
One might wish to treat the row and column variables asymmetrically if one is
the independent variable and the other the dependent variable, or if one (the
criterion) is to be predicted from the other (the predictor). If a measure treats
the two variables asymmetrically, the value of the measure will typically
change if the table is "flipped" so that each row becomes a column and each
column becomes a row. If a measure treats rows and columns symmetrically, flipping the
table
will leave the measure unchanged. Such a measure might be reasonable if two
variables, such as occupation and religion, are presumed be affected by a set of
other variables that may be unmeasured, but the two measured variables cannot
be clearly classified as independent and dependent variables.

**Same categories**..
The column variable sometimes has the same categories as the row variable, as
when husband's religion is tabulated against wife's religion, or the row and
column variables are the ratings of two judges who used the same categories.
Occasionally this sameness may exist but you will choose to ignore it. For
instance, imagine a party guest trying to guess the occupations of the other
guests. If the guesser labeled every schoolteacher a general office worker,
every general office worker a lawyer, and every lawyer a schoolteacher, then it
may be of interest to note that the guesser did distinguish perfectly among the
three groups even though all the guesses were wrong. To do that, use a
measure of association that ignores the fact that the categories of "guess" are
the same as the categories of "actual profession".

**Margin-free and margin-bound measures**..
We will call a measure of association "margin-free" if it is unchanged when all
cell frequencies in a given column are multiplied by an arbitrary constant. By
such measures, the two 2 x 2 tables below would have the same association,
since column 1 in the second table is simply twice column 1 in the first table,
and the second columns are identical.

20 | 30 |

10 | 50 |

40 | 30 |

20 | 50 |

Measures with this property may be useful as measures of causation, where the sizes of treatment and control groups may be arbitrary and you want a measure that is independent of those sizes. For instance, if you report, "The treatment doubled the success rate, from 20% in the control group to 40% in the treatment group," you are reporting a measure of causal efficacy that is independent of group sizes.

A margin-free measure is also useful for measuring predictive power, if the relative sizes of the two criterion groups change from situation to situation and you want a measure that is independent of those sizes. For instance, if a sign in a lie-detection system were observed in 80% of liars and in only 20% of truth-tellers, the 60% difference between these rates is a useful measure of predictive accuracy that is unaffected by the fact that some populations may contain a higher proportion of liars than others.

But if the relative sizes of the criterion groups are known, then you may want a measure of predictive accuracy that is specific to those group sizes. Margin-bound measures have that property.

**Margin totals**..Consider the following
table
showing the numbers of people who got each of two test items correct.

Item | 1 | ||

+ | - | ||

Item | + | 10 | 0 |

2 | - | 80 | 10 |

On the one hand, the two items are highly linked in the sense that nobody ever got item 2 correct after missing item 1. On the other hand, most of the sample got item 1 but missed item 2. Thus the two items have low association in one sense, but that lowness seems to be caused largely by the difference in the marginal totals: 90 people got item 1 correct while only 10 got item 2 correct. These two items have perfect association by measure CA, which corrects for differences in marginal totals, but have only low association by measure OGE, which does not. You might sometimes want to report both OGE and CA, since they measure different properties of the association between two variables.

**Single vs. distributed guesses**..
A predictor trying to predict category membership might do it under various
conditions. The predictor might simply have to choose one category, or might
get to rank the various categories by the likelihood of being the correct one, or
might get to assign an exact probability to each category. For instance,
suppose 100 people fall into categories A, B, C, D with frequencies 10, 20,
30, 40 respectively. If someone were picked at random and you simply had to
guess which category he came from with no individual information about him,
your best guess would be D since that is the largest category. But if you had to
rank the 4 categories from low to high according to your confidence the person
was in each one, you would rank them in the order A, B, C, D. And if you
instead had to assign a probability to each category, reasonable probabilities
would be .1, .2, .3, .4 respectively, since those are the proportions of the four
categories. Lambda, lambda-max and the Gini D all apply when the guesser
picks one category, MR applies when he or she ranks the categories, and MP and
Uncertainty applies when he or she picks actual probabilities.

**Weighted and unweighted guesses**..
In assessing the accuracy of a prediction, you might want to use a measure that
is a function simply of the number of correct guesses, or you might want to
weight correct "long shots" more heavily than other guesses. For instance,
suppose someone is trying to guess the occupations of other people. You
would probably be more impressed at the guesser's accuracy if he or she were correct
on a rare profession like actuary than on a more common profession like
lawyer. Lambda weights all guesses equally while Lambda-max and Gini D
more weight to correct long shots.

All measures requiring ordered categories also have utility interpretations, but they are based on pairs of cases rather than single cases. They are discussed next.

**Concordance** interpretations apply only when the row and column variables
both have a natural order, and apply to all such measures in the outline. For
such measures we can identify any pair of cases as concordant, discordant, or
neither. A pair is concondant if the case higher on the row variable is also
higher on the column variable, while a pair is discordant if the case higher on
the row variable is lower on the column variable. If a pair is tied on either the
row or column variable, it is neither concordant or discordant. Then all the
ordered-variable measures in the outline are in some way proportional to (Con
- Dis), where Con is the number of concordant pairs and Dis is the number of
discordant pairs. This gives these measures a ratio-scale interpretation in terms
of the proportions of pairs that are concordant and discordant. All these
measures have utility interpretations because one can imagine a game in which
you win $1 for each concordant pair and lose $1 for each discordant pair.
Each measure of association can be described as your average winnings in such
a game; as explained in the discussions of those measures, they differ in the
pairs they count.

**Frequency interpretations** are utility interpretations that apply to measures
using unordered categories. Such measures identify a *target* set of cells, and
define *observed*, *null*, and *max* frequencies for that set,
where the null
frequency is the frequency expected in the target set under no association, and
the max frequency is the largest possible frequency in the target set. Then the
measure of association is defined as

association = (observed freq - null freq)/(max freq - null freq)

where "freq" denotes frequency and all frequencies are for the target set. Thus the measure of association equals the amount that the observed frequency in the target set exceeds the null frequency for that set, expressed as a proportion of the largest amount by which the observed frequency could possibly exceed the null.

For instance, the Cohen kappa measures the degree of row-column
agreement when row and column variables have the same categories, as when
two judges each classify each case into one of several categories and you want
to measure the judges' agreement between cases. The target set for kappa
includes the cells in the upper-left-to-lower-right diagonal, since those are the
cells indicating agreement. The null frequency for each cell is calculated by
the same formula for expected frequencies used in computing chi-square
values, and the null frequency for the target set is the sum of these values for
the diagonal cells. The max frequency for the target set is simply *N*. Thus
kappa equals the amount that the observed frequency in the diagonal exceeds
the null frequency for those cells, expressed as a proportion of the largest
amount by which the observed frequency could possibly exceed the null.
Frequency interpretations apply to lambda, OGE, and CA, lambda-d, and
kappa.

**Weighted frequency interpretations **use the same basic formula used by
frequency interpretations--but the observed, null, and max frequencies in the formula are
weighted
rather than unweighted sums of cell frequencies. To see why this may be
reasonable, suppose someone is trying to guess the occupations of other
people. You would probably be more impressed at the guesser's accuracy if
he or she were correct on a rare profession like actuary than on a more common
profession like lawyer. Several measures of association (MR, Lambda-max,
Gini D, and Lambda-max-d) take this into account by weighting each case in
inverse proportion to its column total. Thus all columns count equally despite
differing column frequencies, so cases in sparse columns get more weight.
Except for this, these measures are like measures with simple frequency
interpretations.

**Difference proportionality** is one particular type of frequency interpretation.
Let us say that a 2 x 2 frequency table has double diagonal symmetry (DDS)
if its upper left entry equals its lower right entry, and its lower left equals its
upper right. The three 2 x 2 tables below all have DDS.

7 | 3 | 8 | 2 | 9 | 1 | ||||

3 | 7 | 2 | 8 | 1 | 9 |

DDS does not appear in bold because we use it only to introduce difference proportionality. A measure of association will be said to have difference proportionality if, when applied to a 2 x 2 table with DDS, it equals the difference between the two within-row or within-column proportions. Thus a measure with difference proportionality would equal respectively .4, .6, and .8 in the three tables just shown. A measure will be said to have square proportionality if, in such tables, it equals the square of these differences. Thus a measure with square proportionality would equal respectively .16, .36, and .64 in these three tables. Many of the measures of association in this document have either difference or square proportionality.

Measures with **unique zero** are zero only if the row and column frequencies
are completely independent, as in the table

2 | 4 | 6 |

3 | 6 | 9 |

5 | 10 | 15 |

In this table, the second and third columns are exactly twice and three times the first, producing complete independence. Unique zero seems to us to be an essential property for measures of simple causation, since any pattern other than complete independence implies some sort of causation. (Recall that here we are temporarily ignoring sampling error and seeking measures that would at least have the desired properties in infinitely large populations.)

Some measures of prediction are useful without unique zero. For instance, consider the table

90 | 10 |

51 | 49 |

Suppose you had to guess a case's column from its row, and you wanted to maximize the number of correct guesses. Then your best guess is column 1, whether the case is in row 1 or row 2. Thus the row variable is of no use in predicting column membership, despite the large difference between the two rows. Some measures of association, such as the Goodman-Kruskal lambda, are therefore zero for this table. Such measures lack unique zero.

Lambda-max = lambda when column totals are equal.

If table is square and the largest cell in each column is the diagonal cell, then Lambda-d = lambda and lambda-max-d = lambda-max

Lambda-max reduces to the Gini *D* when *c* = 2.

Gamma reduces to the Yule *Q* when *r* = *c* = 2.

MP = phi^{2} when *c* = 2.

Somers *D* reduces to the rank-biserial *r* when *c* =
2.

Lambda-max = |Somers *D*| in 2 x 2 tables.

**References**

Goodman, Leo A. and Kruskal, William H. (1979) *Measures of association
for cross classifications.* New York, Springer-Verlag.

Theil, Henri (1972). *Statistical decomposition analysis.* Amsterdam, North
Holland.