特征重要性keras回归模型

h7appiyu  于 2024-01-08  发布在  其他
关注(0)|答案(1)|浏览(184)

我试图从我的keras回归模型中提取特征重要性或显着性图:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import shap
from sklearn.inspection import permutation_importance

# Load data (modify as needed)
s_data = np.array([
     [1.0, 0.0, np.nan, 0.0],
     [2.0, 0.0, np.nan, 2.0],
     [0.0, 1.0, 2.0, 0.0]
 ])

# Load phenotype labels as a NumPy array (modify as needed)
 labels = np.array([
     [7.],
     [9.],
     [2.] ])

s_data[np.isnan(s_data)] = -1

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(s_data, labels, test_size=0.2, random_state=42)

# Standardize the input features 
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define a deep learning model
model = keras.Sequential([
    layers.Input(shape=(X_train.shape[1],)),
    layers.Dense(128, activation='relu'),
    layers.Dense(64, activation='relu'),
    layers.Dense(1, activation='linear')  
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')  

# Train the model
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)

字符串
我试着使用eli5库:

import eli5
from eli5.sklearn import PermutationImportance

perm = PermutationImportance(model, random_state=1).fit(X_train,y_train)
eli5.show_weights(perm, feature_names = X_train.columns.tolist())

# Evaluate the model on the test set
test_loss = model.evaluate(X_test, y_test)
print(f'Test Loss: {test_loss}')


但它提出了这个错误:
TypeError:如果没有指定评分,传递的估计器应该有一个'score'方法。估计器<keras.engine.sequential.Sequential object at 0x000002B1ABA41448>没有。
我还尝试编写了自己的置换特征重要性函数:

n_permutations = 100  # Number of permutations to perform
feature_importances = np.zeros(X_test.shape[1])
print (feature_importances.max())

for _ in range(n_permutations):
    shuffled_X_test = X_test.copy()
    np.random.shuffle(shuffled_X_test)  # Permute the feature values
    shuffled_loss = model.evaluate(shuffled_X_test, y_test, verbose=0)
    importance = test_loss - shuffled_loss
    feature_importances += importance

# Normalize feature importances
feature_importances /= n_permutations
# Print feature importances
print("Feature Importances:")
for i, importance in enumerate(feature_importances):
    print(f"SNP{i+1}: {importance:.4f}")


但这给了我每个数据点相同的值。
如何提取对回归模型重要的特征?

jecbmhm3

jecbmhm31#

使用置换测试,每次只取一个特征并进行混洗。在您的实现中,每次都会对所有特征进行混洗,并将结果广播回所有特征。下面是您的代码的未经测试的修改,它显示了我如何修改它以每次置换一个特征:

n_permutations = 100  # Number of permutations to perform
feature_importances = np.zeros(X_test.shape[1])
print (feature_importances.max())

for feature_index in range(len(feature_importances)):
    print('Running permutations for feature:', feature_index)
    for _ in range(n_permutations):
        shuffled_X_test = X_test.copy()
        
        #Shuffle the current feature
        #Keep other features the same
        np.random.shuffle(shuffled_X_test[:, feature_index])
        shuffled_loss = model.evaluate(shuffled_X_test, y_test, verbose=0)
        importance = test_loss - shuffled_loss
        feature_importances[feature_index] += importance
    #Average all the permutation tests for the current feature
    feature_importances[feature_index] /= n_permutations

# Print feature importances
print("Feature Importances:")
for i, importance in enumerate(feature_importances):
    print(f"SNP{i+1}: {importance:.4f}")

字符串

相关问题