Here’s the same VBA code explained in English for automating the process of detecting data anomalies based on statistical thresholds (like 3 standard deviations from the mean).
- Structure of the Data
Suppose the data is in column A starting from row 2 (A2
), and we want to detect anomalies in this data range.
- VBA Code for Detecting Anomalies:
Sub DetectAnomalies()
Dim ws As Worksheet
Dim lastRow As Long
Dim dataRange As Range
Dim value As Double
Dim mean As Double
Dim stdev As Double
Dim thresholdUpper As Double
Dim thresholdLower As Double
Dim cell As Range
' Reference to the active sheet
Set ws = ThisWorkbook.Sheets("Sheet1")
' Define the data range (column A)
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
Set dataRange = ws.Range("A2:A" & lastRow)
' Calculate the mean and standard deviation of the data
mean = Application.WorksheetFunction.Average(dataRange)
stdev = Application.WorksheetFunction.StDev(dataRange)
' Define the anomaly thresholds: 3 standard deviations above or below the mean
thresholdUpper = mean + 3 * stdev
thresholdLower = mean - 3 * stdev
' Loop through each cell in the data range to detect anomalies
For Each cell In dataRange
value = cell.Value
' Check if the value is an anomaly (above or below the thresholds)
If value > thresholdUpper Or value < thresholdLower Then
' Highlight the cell as an anomaly (e.g., in red)
cell.Interior.Color = RGB(255, 0, 0) ' red
cell.Offset(0, 1).Value = "Anomaly" ' Mark the cell next to the value
Else
' If the value is not an anomaly, no change
cell.Interior.ColorIndex = -4142 ' Remove color
cell.Offset(0, 1).Value = ""
End If
Next cell
' Show a message when the process is complete
MsgBox "Anomaly detection complete!", vbInformation
End Sub
Code Explanation:
- Variable Definitions:
- ws: Reference to the active worksheet where the data is stored.
- lastRow: Variable that stores the last row with data in column A.
- dataRange: The range of cells containing the numerical data to analyze (A2).
-
- mean: The mean (average) of the values in the data range.
- stdev: The standard deviation of the values in the data range.
- thresholdUpper and thresholdLower: The upper and lower anomaly thresholds (mean ± 3 standard deviations).
- Calculating the Mean and Standard Deviation:
- The code calculates the average of the data range using Application.WorksheetFunction.Average.
- It also calculates the standard deviation using Application.WorksheetFunction.StDev.
- Anomaly Detection:
- The code loops through each cell in the dataRange.
- For each value, it checks if the value is greater than the thresholdUpper or less than the thresholdLower (i.e., outside 3 standard deviations from the mean).
- If the value is an anomaly, the cell is highlighted in red and the text « Anomaly » is added in the adjacent column (column B).
- If the value is not an anomaly, any color formatting is removed, and the adjacent column (B) is cleared.
- Completion Message:
- Once the anomaly detection process is complete, a message box is displayed to inform the user that the process is finished.
- How to Use the Code?
- Open the VBA Editor:
- Press Alt + F11 to open the VBA editor.
- In the editor, go to Insert > Module to insert a new module.
- Copy and Paste the Code:
- Copy the code above and paste it into the module you just created.
- Run the Code:
- Press F5 to run the code and detect anomalies in the data in column A of the active sheet.
- Customization:
- Anomaly Thresholds: You can adjust the threshold (currently set as 3 standard deviations) by modifying the thresholdUpper and thresholdLower values.
- Data Range: If your data is not in column A or extends across multiple columns, you can modify the dataRange.
- Anomaly Formatting: You can change the highlight color or apply other actions, such as sending an email or saving the results to a different sheet.
This code provides a good starting point for automating anomaly detection in data using VBA. You can tailor it to fit different datasets or anomaly criteria.