Flair-Based Text Classification and Document Summarization

  • Share this:

Code introduction


This function uses the Flair library to perform text classification on the input document and generates a summary by extracting the most important sentences based on the classification results.


Technology Stack : Flair, TextClassifier, Sentence, Document, Tokenization

Code Type : Text Classification and Summary

Code Difficulty : Intermediate


                
                    
def flair_random_document_summary(document, num_sentences=3):
    from flair.models import Document
    from flair.data import Sentence
    from flair.tokenization import Tokenization
    from flair.models import TextClassifier
    from flair.layers import TextClassifier

    # Initialize the document
    doc = Document(document)
    
    # Tokenize the document into sentences
    sentences = [Sentence(s) for s in doc.sentences]
    
    # Load a pre-trained text classifier
    classifier = TextClassifier.load('en-sentiment')

    # Classify each sentence
    for sentence in sentences:
        classifier.predict(sentence)
    
    # Select the top sentences based on the classifier
    top_sentences = sorted(sentences, key=lambda x: x.labels[0].score, reverse=True)[:num_sentences]
    
    # Summarize the document based on the selected sentences
    summary = ' '.join([s.text for s in top_sentences])
    
    return summary