Finance

Charts

Statistics

Macros

Search

Extract URL Links from Text with Excel VBA

This code will loop through the cells in a specified range, search for URLs in the text, and then extract and list them.

Excel VBA Code for Extracting URL Links from Text

Overview

This VBA macro will loop through a specified range of cells (e.g., Column A), search for text that looks like a URL (starting with « http:// » or « https:// »), and extract these URLs into a new column (e.g., Column B). It uses regular expressions (RegExp) to identify the URLs within the text.

Step-by-Step Explanation

  1. Regex Setup: A regular expression (RegExp) pattern is used to identify URLs that start with http:// or https:// and are followed by valid domain names and paths.
  2. Looping through Cells: The code loops through each cell in the specified range and applies the regular expression to find any matching URL patterns.
  3. Extracting URLs: When a match is found, it extracts the URL and stores it in a new column (or any other place you want).
  4. Handling Multiple URLs: If a cell contains multiple URLs, all URLs will be extracted and placed in the new column.

Excel VBA Code

Sub ExtractURLsFromText()
    ' Declare necessary variables
    Dim rng As Range
    Dim cell As Range
    Dim regex As Object
    Dim matches As Object
    Dim match As Variant
    Dim urlPattern As String
    Dim outputCol As Long
    Dim currentRow As Long   
    ' Set the range to process (change "A1:A10" to your desired range)
    Set rng = Range("A1:A10")
    ' Set the column where extracted URLs will be placed (column B in this case)
    outputCol = 2
    currentRow = 1   
    ' Initialize regular expression object
    Set regex = CreateObject("VBScript.RegExp")   
    ' Define the pattern for a URL (http or https)
    urlPattern = "https?://[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)+(/[a-zA-Z0-9-._?=&%]*)?"   
    ' Set the regular expression pattern
    regex.IgnoreCase = True
    regex.Global = True
    regex.Pattern = urlPattern   
    ' Loop through each cell in the specified range
    For Each cell In rng
        ' Ensure the cell contains text
        If Not IsEmpty(cell.Value) Then
            ' Find all matches for the URL pattern in the cell text
            Set matches = regex.Execute(cell.Value)
            ' If URLs are found, process them
            If matches.Count > 0 Then
                ' Loop through all the matches and write them to the output column
                For Each match In matches
                    ' Output each match to the adjacent column (column B)
                    Cells(currentRow, outputCol).Value = match.Value
                    currentRow = currentRow + 1 ' Move to the next row
                Next match
            End If
        End If
    Next cell   
    ' Inform the user that the process is complete
    MsgBox "URL extraction completed.", vbInformation
End Sub

Explanation of the Code

  1. Set the Range (Set rng = Range(« A1:A10 »)): This sets the range in which we want to search for URLs. In this example, it’s A1:A10, but you can modify it according to your needs.
  2. Create Regular Expression Object (Set regex = CreateObject(« VBScript.RegExp »)): This line initializes the regular expression object that will be used to search for URLs in the text.
  3. URL Pattern (urlPattern = « https?://[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)+(/[a-zA-Z0-9-._?=&%]*)? »): This pattern is used to match any URL starting with http:// or https://. The regular expression is flexible enough to capture:
    • Domain names like example.com
    • Subdirectories and query parameters like /path/to/file?query=1

Breakdown:

    • https?:// matches http:// or https://
    • [a-zA-Z0-9-]+ matches the domain name part.
    • (\.[a-zA-Z0-9-]+)+ matches the domain extension (.com, .org, etc.).
    • (/[a-zA-Z0-9-._?=&%]*)? matches optional paths, directories, and parameters that might follow the domain.
  1. Loop through the Cells: The loop For Each cell In rng goes through each cell in the specified range. If the cell is not empty, it uses the regex object to search for matches.
  2. Extracting Matches:
    • Set matches = regex.Execute(cell.Value) runs the regular expression on the cell’s text.
    • If matches.Count > 0 checks if any matches (URLs) are found.
    • For each match found, Cells(currentRow, outputCol).Value = match.Value writes the URL to the adjacent column (outputCol is set to 2, meaning column B).
    • currentRow = currentRow + 1 ensures each URL is placed on a new row.
  3. Message Box: After the loop finishes, a message box informs the user that the URL extraction process is complete.

Possible Customizations

  • Change the Range: You can modify the range of cells by adjusting Set rng = Range(« A1:A10 ») to any desired range.
  • Extracting Multiple URLs: The code already handles multiple URLs per cell, placing each URL in a separate row. If a cell contains several URLs, they will be extracted one by one.
  • Change Output Column: You can change the outputCol value to place the URLs in a different column. For example, changing outputCol = 2 to outputCol = 3 will place the URLs in Column C.

How to Use the Code

  1. Open your Excel workbook.
  2. Press Alt + F11 to open the Visual Basic for Applications (VBA) editor.
  3. Go to Insert > Module to add a new module.
  4. Paste the provided code into the module.
  5. Press F5 or go to Run > Run Sub/UserForm to execute the code.

This code is useful if you are working with a dataset that contains text with embedded URLs, and you need to extract them for further processing. Let me know if you have any questions or need further clarification!

0 0 votes
Évaluation de l'article
S’abonner
Notification pour
guest
0 Commentaires
Le plus ancien
Le plus récent Le plus populaire
Online comments
Show all comments
Facebook
Twitter
LinkedIn
WhatsApp
Email
Print
0
We’d love to hear your thoughts — please leave a commentx