You can download this code by clicking the button below.
This code is now available for download.
This function uses the eli5 library to display the feature importance from a CountVectorizer. It first creates a pipeline with a CountVectorizer and eli5.show_weights, then fits the pipeline to the text data, retrieves the feature importance, and formats the output using a table formatter.
Technology Stack : eli5, sklearn.feature_extraction.text.CountVectorizer, sklearn.pipeline.make_pipeline
Code Type : Function
Code Difficulty : Intermediate
import random
import eli5
from eli5.formatters import table
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline
def random_eli5_function(text_data):
"""
This function uses the eli5 library to display the feature importance from a CountVectorizer
"""
# Create a pipeline with a CountVectorizer
pipeline = make_pipeline(CountVectorizer(), eli5.show_weights)
# Fit the pipeline to the text data
pipeline.fit(text_data)
# Get the feature importance
feature_importance = eli5.explain_weights(pipeline, text_data)
# Format the feature importance using the table formatter
table_formatter = table.TableFormatter()
table_str = table_formatter.format(feature_importance)
return table_str
# Example usage:
text_data = ["This is the first document.", "This document is the second document."]
print(random_eli5_function(text_data))