Random Classification Dataset Feature Importance with PermutationImportance

2024-12-07 16:19:06 23 Views

Code introduction

This function first generates a random classification dataset, then trains a random forest classifier, and uses PermutationImportance from the Eli5 library to calculate feature importance.

Technology Stack : numpy, scikit-learn, eli5

Code Type : Function

Code Difficulty : Intermediate

                
                    
import random
import numpy as np
from sklearn.datasets import make_classification
from eli5.sklearn import PermutationImportance

def random_classification_feature_importance():
    # Generate a random classification dataset
    X, y = make_classification(n_samples=100, n_features=20, random_state=42)
    
    # Train a random forest classifier
    from sklearn.ensemble import RandomForestClassifier
    clf = RandomForestClassifier(n_estimators=10, random_state=42)
    clf.fit(X, y)
    
    # Use PermutationImportance to calculate feature importance
    perm = PermutationImportance(clf, random_state=42).fit(X, y)
    
    # Return the importance scores
    return perm.feature_importances_