You can download this code by clicking the button below.
This code is now available for download.
This function uses the BeautifulSoup library to extract all links from a given HTML content. It accepts HTML content, the tag to search for, and an optional class name as parameters, and then returns a list of URLs containing all extracted links.
Technology Stack : BeautifulSoup
Code Type : Function
Code Difficulty : Intermediate
def extract_links_from_html(html_content, tag='a', class_name=None):
"""
Extracts all links from a given HTML content using BeautifulSoup.
:param html_content: The HTML content from which to extract links.
:param tag: The tag to search for (default is 'a').
:param class_name: The class name to filter the links by (optional).
:return: A list of URLs extracted from the HTML content.
"""
from bs4 import BeautifulSoup, SoupStrainer
# Initialize BeautifulSoup with the HTML content and the specified tag
soup = BeautifulSoup(html_content, 'html.parser', parse_only=SoupStrainer(tag))
# If a class name is provided, filter the links by this class name
if class_name:
links = soup.find_all(class_=class_name)
else:
links = soup.find_all()
# Extract the href attribute from each link and return the list of URLs
return [link.get('href') for link in links if link.get('href') is not None]