# Using Excel 2010 : Linear Regression Analysis

**What is Linear Regression Analysis?**

A linear regression is just a statistical tool used to determine whether or not two (or more) variables are linearly related.

**Pre-Requisites**

Before you can perform a linear regression with Excel, you need to make sure the “Analysis ToolPak” is installed.

**Purpose**

Suppose you want to determine whether a person’s salary is a function of his or her education level (measured in years).

**Maths Equation**

The general form of the relationship is:

Y_{i} = a + bX_{i} + error_{i}

where:

- Y
_{i}= value of Y (salary) for observation i - a = average value of Y (salary) when X (education) is zero
- b = average change in Y (salary) given a one unit increase in X (education), i.e. the average increase in salary for each additional year of education
- X
_{i}= value of X (education) for observation i - error
_{i}= portion of Y (salary) that is unrelated to X (education), i.e. due to other factors (age, years on job, etc.)

**Collection of Raw Data**

You start by collecting a list observations or data, and recording them in your spreadsheet. For ease of computation, it helps to put the dependent variable (Y) in the left column, and the independent variable (X) in the right column.

**Steps**

Click on “Data” and then “Data Analysis” and a window like this will appear.

Scroll down until you see the “Regression” tool, click on it, then click OK.

Another window will appear.

Click inside the box labeled “Input Y Range:”

Click on cell B1 and hold the left mouse button down and highlight cells B1 throught B14

Next, click inside the box labelled “Input X Range:” and then click on cell C1 and hold the left mouse button down and highlight cells C1 through C14

Since we have labels at the top of each data column (and included their cells in the ranges above) click the “Labels” checkbox and then click the “Line Fit Plots” checkbox.

Now, click OK and Excel will perform the linear regression, and put the output on a new page:

**Interpretation**

Cell B17 contains the “intercept” i.e. the value of “a” from the equation: Y_{i} = a + bX_{i} + error_{i}

Cell B18 contains the slope, i.e. the value of “b” from the equation: Y_{i} = a + bX_{i} + error_{i}

So, our regression equation is: Salary = 12,226 + 1833(Education)

We interpret it as: on average, a person’s salary is $12,226 plus $1,833 each year of education he/she has.

We should also look at the “Adjusted R Square” statistic in cell B6 to determine how strong the relationship between salary and education is. In this case, its value is 0.60, which indicates that about 60% of salary is determined by education (so about 40% is determined by other factors).

We can also look at the line fit plot to get a visual feel for how “linear” the relationship is:

The green squares show the “predicted” relationship, i.e. a perfectly straight line from the equation: Y_{i} = a + bX_{i}

The light blue diamonds show the actual relationship: Y_{i} = a + bX_{i} + error_{i}

The error term accounts for the fact that the part of salary is due to other factors not included in our model.