R kernelshap包,带有分类的tidymodels

mgdq6dx1  于 2023-07-31  发布在  其他
关注(0)|答案(1)|浏览(139)

尝试使用tidymodels为分类问题生成形状值时出现问题。
当我试着在tidymodels中训练我的模型后计算形状值时,按照这个网站上的步骤https://github.com/ModelOriented/kernelshap,我不能报告它的分类问题。我的目标变量必须是因子。它总是返回:

check_pred(pred_fun(object,X,...),n = n)中的错误:运行函数kernelshap后预测必须为数值

我找到了用矩阵代替数据框架和用extract fit parsnip提取模型的方法。但问题仍然存在,有没有一种方法可以复制的例子,但分类。示例代码如下

library(tidyverse)
library(tidymodels)

Default <- ISLR::Default

Default = Default %>%
  mutate(
    default = factor(case_when(
      default == "Yes" ~ 1,
      default == "No" ~ 0
    ), levels = c(1,0)
  ) 
)# changing to factor otherwise model will not work
  

# model fitting
split <- initial_split(Default)
  tr <- training(split)
  te <- testing(split)
  

  
  
rec <- recipe(default~., data = Default) %>% 
  step_dummy(all_nominal_predictors())



spec <- boost_tree() %>% 
  set_mode("classification") %>% 
  set_engine("xgboost")

wf <- workflow() %>% 
  add_model(spec) %>% 
  add_recipe(rec)

# model fit
mod <- fit(wf,tr)


library(kernelshap)

x <- rec %>% 
  prep %>% 
  bake(te %>% 
         slice_sample(n = 50)) %>% 
  select(-default) %>% 
  as.data.frame()
bg <- rec %>% prep %>% 
  bake(te %>% slice_sample(n = 10)) %>% 
  mutate(default = as.numeric(as.character(default))) %>% 
  as.data.frame()

# test for prediction
predict(mod, te)

# extract model form tidymodels
md <- extract_fit_parsnip(mod)

# this version works
kernelshap(md$fit, 
           X = x %>% as.matrix(), # if i do it with matrix structure then it works
           bg_X =bg %>% as.matrix()
           
           )

# this version does not work
kernelshap(mod, 
           X = te %>% 
             select(-default) %>%  # remove target var
             slice_sample(n = 50) %>% 
             as.data.frame(), 
           bg_X = te %>% 
             slice_sample(n = 50) %>% 
             as.data.frame()
           
)

################################### error messgae:

#Error in check_pred(pred_fun(object, X, ...), n = n) : 
#  Predictions must be numeric
######################################
              kernelshap(mod, 
                          X = te %>% 
                            select(-default) %>%  # remove target var
                            slice_sample(n = 50) %>% 
                            as.data.frame(), 
                          bg_X = te %>% 
                            slice_sample(n = 50) %>% 
                            as.data.frame()
              )

                          

# toy example from github page using tidymodels

library(tidymodels)
library(kernelshap)

iris_recipe <- iris %>%
  recipe(Sepal.Length ~ .)

reg <- linear_reg() %>%
  set_engine("lm")

iris_wf <- workflow() %>%
  add_recipe(iris_recipe) %>%
  add_model(reg)

fit <- iris_wf %>%
  fit(iris)

ks <- kernelshap(fit, iris[, -1], bg_X = iris)
ks

字符串

iyfamqjs

iyfamqjs1#

{kernelshap}旨在与Tidymodels配合使用。在这种情况下,你可以简单地写:

library(kernelshap)
library(shapviz)

x <- c("student", "balance", "income")
ks <- kernelshap(
  mod, 
  X = head(Default, 1000),    # Assuming random row order
  bg_X = head(Default, 200),  # Assuming random row order
  type = "prob",              # Predictions must be numeric
  feature_names = x           # Or use X = head(Default[x], 1000)
)

sv <- shapviz(ks)             # Contains one shapviz object per class
sv_dependence(sv$.pred_1, v = x)
sv_importance(sv$.pred_1, kind = "bee", show_numbers = TRUE)
sv_importance(sv$.pred_1)

字符串

评论

  • 因为你的模型是通过XGBoost拟合的,所以使用TreeSHAP会更自然,但实际上通过Tidymodels会更棘手。
  • 我建议使用序数编码而不是虚拟编码。

相关问题