我试图从我的keras回归模型中提取特征重要性或显着性图:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import shap
from sklearn.inspection import permutation_importance
# Load data (modify as needed)
s_data = np.array([
[1.0, 0.0, np.nan, 0.0],
[2.0, 0.0, np.nan, 2.0],
[0.0, 1.0, 2.0, 0.0]
])
# Load phenotype labels as a NumPy array (modify as needed)
labels = np.array([
[7.],
[9.],
[2.] ])
s_data[np.isnan(s_data)] = -1
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(s_data, labels, test_size=0.2, random_state=42)
# Standardize the input features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Define a deep learning model
model = keras.Sequential([
layers.Input(shape=(X_train.shape[1],)),
layers.Dense(128, activation='relu'),
layers.Dense(64, activation='relu'),
layers.Dense(1, activation='linear')
])
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the model
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)
字符串
我试着使用eli5库:
import eli5
from eli5.sklearn import PermutationImportance
perm = PermutationImportance(model, random_state=1).fit(X_train,y_train)
eli5.show_weights(perm, feature_names = X_train.columns.tolist())
# Evaluate the model on the test set
test_loss = model.evaluate(X_test, y_test)
print(f'Test Loss: {test_loss}')
型
但它提出了这个错误:
TypeError:如果没有指定评分,传递的估计器应该有一个'score'方法。估计器<keras.engine.sequential.Sequential object at 0x000002B1ABA41448>没有。
我还尝试编写了自己的置换特征重要性函数:
n_permutations = 100 # Number of permutations to perform
feature_importances = np.zeros(X_test.shape[1])
print (feature_importances.max())
for _ in range(n_permutations):
shuffled_X_test = X_test.copy()
np.random.shuffle(shuffled_X_test) # Permute the feature values
shuffled_loss = model.evaluate(shuffled_X_test, y_test, verbose=0)
importance = test_loss - shuffled_loss
feature_importances += importance
# Normalize feature importances
feature_importances /= n_permutations
# Print feature importances
print("Feature Importances:")
for i, importance in enumerate(feature_importances):
print(f"SNP{i+1}: {importance:.4f}")
型
但这给了我每个数据点相同的值。
如何提取对回归模型重要的特征?
1条答案
按热度按时间jecbmhm31#
使用置换测试,每次只取一个特征并进行混洗。在您的实现中,每次都会对所有特征进行混洗,并将结果广播回所有特征。下面是您的代码的未经测试的修改,它显示了我如何修改它以每次置换一个特征:
字符串