Pearson’s (Linear) Correlation Coefficient Reference Page
Purpose: Pearson’s correlation coefficient () is a numerical measure of the strength of a linear association between two variables.
1) Both variables must be quantitative, in the sense that for each variable, it makes sense to do arithmetic with the measurements.
2) YOU MUST HAVE A REASON TO BELIEVE THAT THERE IS A LINEAR ASSOCIATION BETWEEN THE TWO VARIABLES BEFORE YOU COMPUTE . IF THE ASSOCIATION IS NONLINEAR, IS GUARANTEED TO BE HIGHER THAN IT SHOULD BE, WHICH WILL SOMETIMES INCORRECTLY LEAD YOU TO BELIEVE THE ASSOCIATION IS LINEAR.
3) Strictly speaking, both variables should be random; indeed, the interpretation of is safest when the two variables are “jointly normal” (whatever that means). However, many people use even when the values of one of the variables are not random. We discourage this practice; the coefficient of determination is a better tool to use in that situation.
4) This coefficient should be accompanied by a scatterplot whenever possible.
There are additional technical considerations; unfortunately, they are too technical for an introductory course. Consequently, many people who use do so without the proper safeguards, which is irresponsible and potentially hazardous.
How to Compute It: We recommend using software such as SPSS (or a calculator, though we are less enthusiastic about this option).
Interpretation: The value of is always between -1 and 1; the closer is to either of these values, the stronger the linear association between the variables. If is close to 0, the linear association is weak. If is zero, there is no linear association between the variables. If is positive, so is the association; likewise for negative . The value of does not distinguish between explanatory and response variables.
1) Pearson’s correlation coefficient measures the strength of linear associations only. There may be times when you see the strength of a nonlinear association described by something called ; if those responsible are using their statistical tools correctly, their is not a direct measure of the strength of that nonlinear association.
2) A low value of does not mean that a linear association does not exist, only that if there is a linear association present, it is weak. Likewise, a high value of does not mean that a linear association is present, only that if one is present, it is strong. Thus, as a descriptive statistic, must not be used to infer the nature of an association, only to measure the strength of a linear association.
3) Note carefully that only describes this strength to the extent of the actual measured values of the two variables included in its calculation. It cannot measure the strength on a linear association for values of variables outside the ranges of the measurements made on individuals in the sample.
4) is quite sensitive to outliers, influential points, and skewing, and if the data include unusual points that fit the linear pattern, these can cause to be higher than it should. Also, it is possible for a decidedly nonlinear association to yield a high value of ; this is due largely to bias caused by the nonlinear nature of the association.
5) Use of with averages is unwise, as any averaging process hides not only the variation in the individual measurements, but the nature of the distribution of those measurements and therefore the nature of the association (if any) between the variables whose averages are used in the calculation. Typically, is lower for individual measurements and higher for averages made from those very same measurements.
6) Moreover, more or less ignores any lurking variables.
7) Indeed, the value of does not depend on whether either variable is seen as the “cause” (explanatory variable) or the “effect” (response)! So is utterly incapable of establishing that one thing causes another. Hence, above all, a value of close to 1 or -1 must not be construed as "proof" of a cause-and-effect relationship between the variables. However, it may be used carefully as part of a larger program for establishing causality.
SPSS instructions for computing Pearson’s correlation coefficient: Click here.
Examples of Pearson’s correlation coefficient, with scatterplots: Click here.
The coefficient of determination
Testing claims about Pearson’s correlation coefficient
Simple linear regression
David E. Brown
232 Ricks Building 208-496-1839 voice
Rexburg, ID 83460 208-496-2005 fax
Please do not call me at home.