Coefficient of Determination Reference Page

**Purpose:** The
coefficient of determination measures the predictive ability of a least-squares
regression line.

**Conditions:** The
conditions under which
may
be used are similar to those under which least-squares linear regression
may be used. To wit:

1) Measurements must be pairs, each response value in the data set being accompanied by the corresponding value of the explanatory variable.

2) Both variables are quantitative, in that it makes sense to do arithmetic with measurements from each variable.

3) There is some excuse for randomness in the data. (Preferably, both variables should be random. Using a simple random sample ought to be sufficient for our class.)

4) A scatterplot suggests that…

(a) …a linear relationship exists between the variables (or better yet, there are sound theoretical reasons to believe that the variables may be linearly associated.)

(b) …any unusual points are under control (not too many, not too far away, preferably within the linear pattern; no influential points).

In advanced classes, the above conditions may be modified so as to be more accurate.

**How to Compute It:** We recommend using software such
as SPSS (or a graphing calculator, though we are less enthusiastic about this
option). However, we note here that if you have
,
then you can calculate the coefficient of determination as
(that is, as
times itself).

It doesn't make much sense to use without a regression line, as is a measure of the predictive ability of the regression line's formula. You also ought to have a scatterplot of the data.

**Interpretation:**

is literally the proportion of the variation in response accounted for or explained by the regression line. In other words, the percentage of the variation in the response variable that is explained by the linear association with the explanatory variable, is 100 x . Therefore, is always between 0 and 1; the closer is to 1, the better a predictor the line is.

**Warnings:**

1) describes linear associations __only__. Thus,
must not be used to describe any other kind or any unknown kind
of association. A
HIGH VALUE OF
IS
NOT, BY ITSELF, SUFFICIENT EVIDENCE TO CONCLUDE THAT THERE IS A LINEAR
ASSOCIATION. Either there must be sound theoretical reasons to believe in a
linear association, or there must be "independent" evidence, such as a
scatterplot showing a clear, non-horizontal linear association, with consistent
variation in responses throughout the range of the explanatory variable.

2) is quite sensitive to outliers, influential points, and skewing in the responses.

3)
says __nothing__ about the association (if any) between the
variables, beyond the range
of the data used to compute
in the first place.
can
only describe the association for values found in the original data.

4) If your data are averages, use of is unwise, as any averaging process hides the nature of the distributions in the measurements, as well as the true nature of the association (if any) between the variables.

5) more or less ignores any lurking variables.

6) If the association between variables is not
linear,
is
__GUARANTEED TO BE TOO HIGH.__ Hence, the fact
that
is
high does not (by itself) imply that an association is linear.

7) The value of does not change if the explanatory and response variables trade roles. Therefore, is incapable of establishing the existence of a cause-effect relationship. However, may be used carefully as part of a larger program for establishing causality.

8) If there is more than one explanatory variable, more than one response variable, or if the type of regression is not linear, then people with sophomore-level statistical education are ill-prepared to face the attendant challenges. (In other words, leave it to people with upper-division or, better yet, graduate-level education in statistics.)

10) Even if there is a linear association between variables, and even if the
coefficient of determination is relatively high, there can be a large amount of variation
in y-values, so *individual* predictions based on the regression line can be very inaccurate.
(This is true in spite of the fact that
measures the predictive ability of the line!)

**SPSS instructions for calculating regression lines and
the coefficient of determination:**
**
Click here**.

**Examples of regression lines with the coefficient of
determination and scatterplots:**

(NOT YET AVAILABLE)

**Related topics:**

Pearson's linear correlation coefficient

Testing claims about Pearson's linear correlation coefficient

Simple least-squares linear regression

Testing claims about simple least-squares linear regression

Conducting a complete linear regression analysis

**Statistics Reference Pages page:**
Click here

BYU-Idaho mailto:brownd@byui.edu

232H Ricks Building 208-496-1839 voice

Rexburg, ID 83460 208-496-2005 fax

**Please
do not call me at home.**