Extract Text from HTML by Tag Name

  • Share this:

Code introduction


This function extracts all text content from HTML content with a specified tag name and returns a list of texts.


Technology Stack : lxml

Code Type : HTML parsing

Code Difficulty : Intermediate


                
                    
def extract_text_from_html(html_content, tag_name):
    from lxml import etree
    
    parser = etree.HTMLParser()
    tree = etree.fromstring(html_content, parser)
    elements = tree.xpath(f'//{tag_name}')
    text_list = [element.text_content().strip() for element in elements]
    return text_list                
              
Tags: