Explaining Feature Importance with eli5 and CountVectorizer

  • Share this:

Code introduction


This function uses the eli5 library to display the feature importance from a CountVectorizer. It first creates a pipeline with a CountVectorizer and eli5.show_weights, then fits the pipeline to the text data, retrieves the feature importance, and formats the output using a table formatter.


Technology Stack : eli5, sklearn.feature_extraction.text.CountVectorizer, sklearn.pipeline.make_pipeline

Code Type : Function

Code Difficulty : Intermediate


                
                    
import random
import eli5
from eli5.formatters import table
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline

def random_eli5_function(text_data):
    """
    This function uses the eli5 library to display the feature importance from a CountVectorizer
    """
    # Create a pipeline with a CountVectorizer
    pipeline = make_pipeline(CountVectorizer(), eli5.show_weights)
    
    # Fit the pipeline to the text data
    pipeline.fit(text_data)
    
    # Get the feature importance
    feature_importance = eli5.explain_weights(pipeline, text_data)
    
    # Format the feature importance using the table formatter
    table_formatter = table.TableFormatter()
    table_str = table_formatter.format(feature_importance)
    
    return table_str

# Example usage:
text_data = ["This is the first document.", "This document is the second document."]
print(random_eli5_function(text_data))