Extracting <h1> Titles from Web Pages with BeautifulSoup

  • Share this:

Code introduction


This function uses the Beautiful Soup library and the requests library to retrieve HTML content from a specified URL, then parses the HTML to extract all titles within <h1> tags.


Technology Stack : Beautiful Soup, requests

Code Type : Function

Code Difficulty : Intermediate


                
                    
def extract_titles(url, parser='html.parser'):
    from bs4 import BeautifulSoup
    import requests

    # Send a GET request to the URL
    response = requests.get(url)
    
    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(response.content, parser)
    
    # Find all the titles in the HTML document
    titles = soup.find_all('h1')
    
    # Extract and return the text from each title
    return [title.get_text() for title in titles]