Interpreting Heart Disease Risk with SHAP and Random Forest

  • Share this:

Code introduction


This function uses the SHAP library to interpret the prediction of heart disease risk by a random forest model. It first loads the heart disease dataset, then trains a random forest model, and uses SHAP library to calculate the SHAP values of the model. Finally, it uses the SHAP values to predict the risk of heart disease for given age and cholesterol levels.


Technology Stack : numpy, shap, sklearn

Code Type : Machine learning prediction function

Code Difficulty : Intermediate


                
                    
def predict_heart_disease(age, cholesterol):
    import numpy as np
    import shap
    from sklearn.datasets import load_heart
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import train_test_split

    # 加载数据
    data = load_heart()
    X, y = data.data, data.target

    # 分割数据集
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # 训练模型
    model = RandomForestClassifier(n_estimators=100)
    model.fit(X_train, y_train)

    # 创建SHAP解释器
    explainer = shap.TreeExplainer(model)

    # 生成SHAP值
    shap_values = explainer.shap_values(X_test)

    # 计算平均SHAP值
    mean_shap_values = np.mean(shap_values, axis=0)

    # 使用SHAP值预测心脏病风险
    shap_value = mean_shap_values[0, 0] * age + mean_shap_values[0, 1] * cholesterol
    prediction = model.predict([np.array([age, cholesterol])])

    return prediction