Pearson’s (Linear) Correlation Coefficient Reference Page

**Purpose:** Pearson’s correlation coefficient ()
is a numerical measure of the strength of a linear association between two
variables.

**Conditions:**

1) Both variables must be quantitative, in the sense that for each variable, it makes sense to do arithmetic with the measurements.

2) YOU MUST HAVE A REASON TO BELIEVE THAT THERE IS A __LINEAR__
ASSOCIATION BETWEEN THE TWO VARIABLES ** BEFORE** YOU COMPUTE
.
IF THE ASSOCIATION IS NONLINEAR,
IS

3) Strictly speaking, both variables should be random; indeed, the interpretation of is safest when the two variables are “jointly normal” (whatever that means). However, many people use even when the values of one of the variables are not random. We discourage this practice; the coefficient of determination is a better tool to use in that situation.

4) This coefficient should be accompanied by a scatterplot whenever possible.

There are additional technical considerations; unfortunately, they are too technical for an introductory course. Consequently, many people who use do so without the proper safeguards, which is irresponsible and potentially hazardous.

**How to Compute It:** We recommend using software such
as SPSS (or a calculator, though we are less enthusiastic about this option).

**Interpretation:** The value of
is
always between -1 and 1; the closer
is
to either of these values, the stronger the linear association between the
variables. If
is
close to 0, the linear association is weak. If
is
zero, there is no linear association between the variables. If
is
positive, so is the association; likewise for negative
.
The value of
does
not distinguish between explanatory and response variables.

**Warnings:**

1) Pearson’s correlation coefficient
measures the __strength__ of __linear__ associations __only__. There
may be times when you see the strength of a __nonlinear__ association
described by something called
;
if those responsible are using their statistical tools correctly, their
is
** not** a direct measure of the strength of that nonlinear association.

2) A low
value of
does
not mean that a linear association does not exist, only that __if__ there is
a linear association present, it is weak. Likewise, a high value of
does
not mean that a linear association is present, only that __if__ one is
present, it is strong. Thus, as a descriptive statistic,
must
**not** be used to infer the __nature__ of an association, only to measure the __strength__
of a __linear__ association.

3) Note carefully that only describes this strength to the extent of the actual measured values of the two variables included in its calculation. It cannot measure the strength on a linear association for values of variables outside the ranges of the measurements made on individuals in the sample.

4) is quite sensitive to outliers, influential points, and skewing, and if the data include unusual points that fit the linear pattern, these can cause to be higher than it should. Also, it is possible for a decidedly nonlinear association to yield a high value of ; this is due largely to bias caused by the nonlinear nature of the association.

5) Use of
with
averages is unwise, as any averaging process hides not only the variation in the
individual measurements, but the nature of the distribution of those
measurements and therefore the nature of the association (if any) between the
variables whose averages are used in the calculation. Typically,
is lower for individual measurements and higher for averages made *from those
very same measurements*.

6) Moreover, more or less ignores any lurking variables.

7) Indeed, the
value of
does
not depend on whether either variable is seen as the “cause” (explanatory
variable) or the “effect” (response)!** So
**** is
utterly incapable of establishing that one thing causes another**. Hence, **above all**, a value of
close
to 1 or -1 must **not** be construed as "proof" of a cause-and-effect
relationship between the variables. However, it may
be used carefully as part of a larger program for establishing causality.

**SPSS instructions for computing Pearson’s correlation
coefficient:**
Click here.

**Examples of Pearson’s correlation coefficient, with
scatterplots:**
Click here.

**Related topics:**

__The coefficient of
determination __

Testing claims about Pearson’s correlation coefficient

BYU-Idaho mailto:brownd@byui.edu

232 Ricks Building 208-496-1839 voice

Rexburg, ID 83460 208-496-2005 fax

**Please
do not call me at home.**