Extract Links from HTML by Tag and Class

2024-12-07 16:27:36 78 Views

Code introduction

This function extracts links from HTML content based on a specified tag and class. It uses the BeautifulSoup library to parse the HTML, filters out the specified tag and class, and then returns the URLs of these links.

Technology Stack : BeautifulSoup

Code Type : Function

Code Difficulty : Intermediate

                
                    
def extract_links_from_html(html_content, tag='a', class_='link'):
    from bs4 import BeautifulSoup, SoupStrainer

    # Initialize a BeautifulSoup object with only the specified tag
    tag_filter = SoupStrainer(tag)
    soup = BeautifulSoup(html_content, 'html.parser', parse_only=tag_filter)

    # Extract all tags of the specified class
    links = soup.find_all(class_=class_)

    # Extract and return the href attributes of the links
    return [link.get('href') for link in links]

Tags: BeautifulSoup