Regression analysis is a statistical method used for modeling the relationship between a dependent variable (Y) and one or more independent variables (X). In this example, we’ll use simple linear regression, but you can extend the logic to multiple regression or other advanced models as needed.
Overview of the Code Structure
- Preparation of the Data: First, ensure the data is available in a worksheet. The independent variables (X) will be in columns, and the dependent variable (Y) will be in another column.
- Perform Linear Regression: We’ll use Excel’s built-in
LINESTfunction for linear regression, which returns the slope and intercept of the regression line. - Prediction: After performing regression, we’ll use the equation of the line to predict values of Y for given X values.
- Visualization: We’ll also create a scatter plot to visualize the data points and the fitted regression line.
Step-by-Step VBA Code for Linear Regression
Sub AdvancedDataAnalysis_LinearRegression()
' Step 1: Declare variables for range references
Dim ws As Worksheet
Dim XRange As Range, YRange As Range
Dim RegressionResults As Variant
Dim Slope As Double, Intercept As Double
Dim PredictedY As Double
Dim DataRow As Long
Dim ChartObj As ChartObject
' Step 2: Set up worksheet reference (adjust to your worksheet name)
Set ws = ThisWorkbook.Sheets("Sheet1")
' Step 3: Define the range for independent (X) and dependent (Y) variables
Set XRange = ws.Range("A2:A100") ' Independent variable (X) in column A
Set YRange = ws.Range("B2:B100") ' Dependent variable (Y) in column B
' Step 4: Perform Linear Regression using LINEST function
' The LINEST function returns an array with regression parameters (slope, intercept, etc.)
RegressionResults = Application.WorksheetFunction.LinEst(YRange, XRange, True, True)
' Extract the Slope and Intercept from the regression results array
Slope = RegressionResults(1, 1) ' Slope of the regression line
Intercept = RegressionResults(1, 2) ' Intercept of the regression line
' Step 5: Output the regression parameters to the sheet
ws.Range("D1").Value = "Slope"
ws.Range("D2").Value = Slope
ws.Range("E1").Value = "Intercept"
ws.Range("E2").Value = Intercept
' Step 6: Predict Y values using the regression equation (Y = mX + b)
' Loop through each value in the X column and calculate the corresponding Y
For DataRow = 2 To XRange.Rows.Count
PredictedY = (Slope * XRange.Cells(DataRow, 1).Value) + Intercept
ws.Cells(DataRow, 3).Value = PredictedY ' Place predicted Y in column C
Next DataRow
' Step 7: Create a scatter plot with the original data and the regression line
Set ChartObj = ws.ChartObjects.Add(Left:=100, Width:=375, Top:=75, Height:=225)
With ChartObj.Chart
.ChartType = xlXYScatterLines ' Scatter plot with lines (regression line)
.SetSourceData Source:=ws.Range("A2:B100") ' Use original data
.SeriesCollection.NewSeries
.SeriesCollection(2).XValues = XRange
.SeriesCollection(2).Values = ws.Range("C2:C100") ' Predicted Y values (Regression Line)
.HasTitle = True
.ChartTitle.Text = "Regression Analysis: Y vs X"
.Axes(xlCategory, xlPrimary).HasTitle = True
.Axes(xlCategory, xlPrimary).AxisTitle.Text = "X (Independent Variable)"
.Axes(xlValue, xlPrimary).HasTitle = True
.Axes(xlValue, xlPrimary).AxisTitle.Text = "Y (Dependent Variable)"
End With
' Optional: Add a trendline to the scatter plot for better visualization
With ChartObj.Chart.SeriesCollection(1).Trendlines.Add
.Type = xlLinear
.Name = "Regression Trendline"
.DisplayEquation = True
.DisplayRSquared = True
End With
MsgBox "Linear Regression Analysis Completed!"
End Sub
Explanation of the Code:
- Variable Declaration:
ws: This refers to the worksheet where the data is located.XRangeandYRange: These represent the ranges for the independent (X) and dependent (Y) variables.RegressionResults: An array that stores the results of theLINESTfunction (it will return slope, intercept, and other statistics).SlopeandIntercept: These store the values for the slope and intercept of the regression equation.PredictedY: This variable is used to store the predicted Y value for each X.
- Performing Linear Regression:
- The
LinEstfunction is used to compute the linear regression parameters. The function returns a 2D array where:RegressionResults(1, 1)gives the slope of the regression line.RegressionResults(1, 2)gives the intercept.- Other results (e.g., R-squared value) can also be extracted from this array.
- The
- Prediction:
- After computing the slope and intercept, we loop through each value in the X range and use the regression equation
Y = mX + bto predict the corresponding Y value. These predicted values are placed in column C.
- After computing the slope and intercept, we loop through each value in the X range and use the regression equation
- Creating a Scatter Plot:
- A scatter plot is generated using the
ChartObjects.Addmethod, displaying the original data points and the fitted regression line (using the predicted values in column C). - Additionally, a linear trendline is added to the scatter plot to visualize the regression line clearly, and the equation of the line along with R-squared value is displayed.
- A scatter plot is generated using the
Resulting Output:
- The slope and intercept of the regression line will be displayed in cells
D2andE2. - The predicted Y values (calculated using the regression equation) will appear in column C.
- A scatter plot with the regression line will be created for easy visualization.
- A trendline with the regression equation and R-squared value will be displayed on the chart.
Conclusion:
This Excel VBA code demonstrates how to perform a simple linear regression analysis. You can modify it to fit more complex data models, such as multiple regression, by adjusting the input ranges and incorporating more variables.
You could also adapt this to other advanced data analysis models, like time series forecasting, clustering, or classification, using more advanced algorithms or external libraries that work with VBA. However, for truly complex models, consider integrating Excel with other tools like Python, R, or specialized statistical software.