This code will loop through the cells in a specified range, search for URLs in the text, and then extract and list them.
Excel VBA Code for Extracting URL Links from Text
Overview
This VBA macro will loop through a specified range of cells (e.g., Column A), search for text that looks like a URL (starting with « http:// » or « https:// »), and extract these URLs into a new column (e.g., Column B). It uses regular expressions (RegExp) to identify the URLs within the text.
Step-by-Step Explanation
- Regex Setup: A regular expression (RegExp) pattern is used to identify URLs that start with http:// or https:// and are followed by valid domain names and paths.
- Looping through Cells: The code loops through each cell in the specified range and applies the regular expression to find any matching URL patterns.
- Extracting URLs: When a match is found, it extracts the URL and stores it in a new column (or any other place you want).
- Handling Multiple URLs: If a cell contains multiple URLs, all URLs will be extracted and placed in the new column.
Excel VBA Code
Sub ExtractURLsFromText()
' Declare necessary variables
Dim rng As Range
Dim cell As Range
Dim regex As Object
Dim matches As Object
Dim match As Variant
Dim urlPattern As String
Dim outputCol As Long
Dim currentRow As Long
' Set the range to process (change "A1:A10" to your desired range)
Set rng = Range("A1:A10")
' Set the column where extracted URLs will be placed (column B in this case)
outputCol = 2
currentRow = 1
' Initialize regular expression object
Set regex = CreateObject("VBScript.RegExp")
' Define the pattern for a URL (http or https)
urlPattern = "https?://[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)+(/[a-zA-Z0-9-._?=&%]*)?"
' Set the regular expression pattern
regex.IgnoreCase = True
regex.Global = True
regex.Pattern = urlPattern
' Loop through each cell in the specified range
For Each cell In rng
' Ensure the cell contains text
If Not IsEmpty(cell.Value) Then
' Find all matches for the URL pattern in the cell text
Set matches = regex.Execute(cell.Value)
' If URLs are found, process them
If matches.Count > 0 Then
' Loop through all the matches and write them to the output column
For Each match In matches
' Output each match to the adjacent column (column B)
Cells(currentRow, outputCol).Value = match.Value
currentRow = currentRow + 1 ' Move to the next row
Next match
End If
End If
Next cell
' Inform the user that the process is complete
MsgBox "URL extraction completed.", vbInformation
End Sub
Explanation of the Code
- Set the Range (Set rng = Range(« A1:A10 »)): This sets the range in which we want to search for URLs. In this example, it’s A1:A10, but you can modify it according to your needs.
- Create Regular Expression Object (Set regex = CreateObject(« VBScript.RegExp »)): This line initializes the regular expression object that will be used to search for URLs in the text.
- URL Pattern (urlPattern = « https?://[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)+(/[a-zA-Z0-9-._?=&%]*)? »): This pattern is used to match any URL starting with http:// or https://. The regular expression is flexible enough to capture:
- Domain names like example.com
- Subdirectories and query parameters like /path/to/file?query=1
Breakdown:
-
- https?:// matches http:// or https://
- [a-zA-Z0-9-]+ matches the domain name part.
- (\.[a-zA-Z0-9-]+)+ matches the domain extension (.com, .org, etc.).
- (/[a-zA-Z0-9-._?=&%]*)? matches optional paths, directories, and parameters that might follow the domain.
- Loop through the Cells: The loop For Each cell In rng goes through each cell in the specified range. If the cell is not empty, it uses the regex object to search for matches.
- Extracting Matches:
- Set matches = regex.Execute(cell.Value) runs the regular expression on the cell’s text.
- If matches.Count > 0 checks if any matches (URLs) are found.
- For each match found, Cells(currentRow, outputCol).Value = match.Value writes the URL to the adjacent column (outputCol is set to 2, meaning column B).
- currentRow = currentRow + 1 ensures each URL is placed on a new row.
- Message Box: After the loop finishes, a message box informs the user that the URL extraction process is complete.
Possible Customizations
- Change the Range: You can modify the range of cells by adjusting Set rng = Range(« A1:A10 ») to any desired range.
- Extracting Multiple URLs: The code already handles multiple URLs per cell, placing each URL in a separate row. If a cell contains several URLs, they will be extracted one by one.
- Change Output Column: You can change the outputCol value to place the URLs in a different column. For example, changing outputCol = 2 to outputCol = 3 will place the URLs in Column C.
How to Use the Code
- Open your Excel workbook.
- Press Alt + F11 to open the Visual Basic for Applications (VBA) editor.
- Go to Insert > Module to add a new module.
- Paste the provided code into the module.
- Press F5 or go to Run > Run Sub/UserForm to execute the code.
This code is useful if you are working with a dataset that contains text with embedded URLs, and you need to extract them for further processing. Let me know if you have any questions or need further clarification!