2. What Is Linear Regression? The Art of Drawing the “Best” Line
To understand linear regression, first imagine a simple chart called a scatter plot.
- On the horizontal axis (the x-axis), we plot the monthly return of the entire market (using the S&P 500 index as a stand-in).
- On the vertical axis (the y-axis), we plot the monthly return of a single stock, like General Electric (GE).
Each dot on the chart represents one month of data, showing both the market’s return and GE’s return for that period. When we plot many months of data, we get a cloud of dots. As financial analysis shows, for a stock like GE and the S&P 500, these dots tend to trend upwards from left to right, showing a positive correlation: when the market does well, GE tends to do well, too.
Linear regression is the process of drawing a single, straight “line of best fit” through this cloud of dots. This line doesn’t pass through every point, but it perfectly summarizes the general relationship, or trend, between the market and the stock.
So out of all the trillions of lines we could possibly draw, what makes this one the “best”? Imagine each dot is attached to the line by a vertical rubber band. These rubber bands represent the “errors”—the vertical distance from each actual data point to our prediction line. The “line of best fit” is the exact position and tilt of the line that makes the total stretch on all the rubber bands as small as possible. It’s the line that is under the least total tension.
Now that we have our line, the most important piece of information it gives us isn’t the line itself, but its slope.