Extracting Links from HTML with BeautifulSoup

2024-12-07 16:29:31 130 Views

Code introduction

This function uses the BeautifulSoup library to extract all links from the given HTML content. It can specify the tag to search for and the attributes to filter the links.

Technology Stack : Beautiful Soup, HTML Parsing

Code Type : Function

Code Difficulty : Intermediate

                
                    
def extract_links_from_html(html_content, tag='a', attributes=None):
    """
    Extracts all links from a given HTML content using BeautifulSoup.

    :param html_content: A string containing the HTML content.
    :param tag: The HTML tag to search for (default is 'a').
    :param attributes: A dictionary of attributes to filter the links (default is None).
    :return: A list of extracted links.
    """
    from bs4 import BeautifulSoup

    soup = BeautifulSoup(html_content, 'html.parser')
    links = soup.find_all(tag, attributes)
    return [link.get('href') for link in links if link.get('href') is not None]