Extract <h1> Titles from Web Pages

  • Share this:

Code introduction


This function extracts all <h1> tag text content from a specified URL, which is the title of the web page.


Technology Stack : BeautifulSoup, urllib.request

Code Type : Function

Code Difficulty : Intermediate


                
                    
def extract_titles(url, parser='html.parser'):
    from bs4 import BeautifulSoup, SoupStrainer
    from urllib.request import urlopen

    # Fetch the web page content
    response = urlopen(url)
    html_content = response.read()
    response.close()

    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(html_content, parser)

    # Extract all titles from the page
    titles = soup.find_all('h1')

    # Return the list of titles
    return [title.get_text() for title in titles]