R语言 使用'case_when'为给定条件一次分配多个新变量

2exbekwf  于 2023-01-15  发布在  其他
关注(0)|答案(3)|浏览(256)

我有一个包含两个变量df$soildf$use的 Dataframe df。我想根据一个条件将两个新变量df$ef1df$ef2添加到我的数据集中。我使用'case_when'来完成此操作:

ef1_grassl_mineral <- 0.2
ef1_grassl_peat    <- 0.3
ef1_arable_mineral <- 0.4
ef1_arable_peat    <- 0.5

ef2_grassl_mineral <- 2.3
ef2_grassl_peat    <- 3.4
ef2_arable_mineral <- 4.5
ef2_arable_peat    <- 5.6

df <- data.frame(soil = c('mineral', 'peat', 'mineral', 'peat'),
                 use  = c('grassl', 'arable', 'arable', 'grassl'))

df <- df %>% mutate (
  ef1 = case_when((soil=='mineral' & use=='grassl') ~ ef1_grassl_mineral,
                  (soil=='peat'    & use=='grassl') ~ ef1_grassl_peat,
                  (soil=='mineral' & use=='arable') ~ ef1_arable_mineral,
                  (soil=='peat'    & use=='arable') ~ ef1_arable_peat),
  ef2 = case_when((soil=='mineral' & use=='grassl') ~ ef2_grassl_mineral,
                  (soil=='peat'    & use=='grassl') ~ ef2_grassl_peat,
                  (soil=='mineral' & use=='arable') ~ ef2_arable_mineral,
                  (soil=='peat'    & use=='arable') ~ ef2_arable_peat))

上面的方法很好用,但是我必须为每个变量重复条件,这使得代码很长。因此,我想知道是否有一种方法可以只指定一次条件(例如,soil=='mineral' & use=='arable'),然后定义df$ef1 AND df$ef2。(语法:如果(土壤==“矿物”且使用==“耕地”),则ef1= ef1_阿拉伯_矿物且ef2= ef2_阿拉伯_矿物)

edqdpe6u

edqdpe6u1#

请改用查找表和连接

lookup = tribble(
  ~soil, ~use, ~ef1, ~ef2,
  "mineral", "grassl", 0.2, 2.3,
  "peat", "grassl", 0.3, 3.4,
  "mineral", "arable", 0.4, 4.5,
  "peat", "arable", 0.5, 5.6
)

然后,如果您有一个更大的 Dataframe ,需要在现有soiluse列的基础上添加ef1ef2列,则执行bigger_data %>% left_join(lookup, by = c("soil", "use"))
我最喜欢这种查找表的特点是它们非常容易审计/调试。如果其他人需要检查值,您可以将查找表存储为平面文件(CSV或类似文件),即使对非技术人员也非常清楚。

khbbv19g

khbbv19g2#

可以使用list()存储多列的值,然后传递给tidyr::unnest_wider()

library(tidyverse)

df %>%
  mutate(ef = case_when(
    (soil == 'mineral' & use == 'grassl') ~ list(c(0.2, 2.3)),
    (soil == 'peat'    & use == 'grassl') ~ list(c(0.3, 3.4)),
    (soil == 'mineral' & use == 'arable') ~ list(c(0.4, 4.5)),
    (soil == 'peat'    & use == 'arable') ~ list(c(0.5, 5.6)))
  ) %>%
  unnest_wider(ef, names_sep = '')

# # A tibble: 4 × 4
#   soil    use      ef1   ef2
#   <chr>   <chr>  <dbl> <dbl>
# 1 mineral grassl   0.2   2.3
# 2 peat    arable   0.5   5.6
# 3 mineral arable   0.4   4.5
# 4 peat    grassl   0.3   3.4
68bkxrlz

68bkxrlz3#

这可能不是最好的解决方案,但另一个有趣的方法是将所有ef1ef2查找值放在一个列表中,然后通过连接列来调用它们:

library(tidyverse)

ef1 <- ef2 <- list()

ef1$grassl_mineral <- 0.2
ef1$grassl_peat    <- 0.3
ef1$arable_mineral <- 0.4
ef1$arable_peat    <- 0.5

ef2$grassl_mineral <- 2.3
ef2$grassl_peat    <- 3.4
ef2$arable_mineral <- 4.5
ef2$arable_peat    <- 5.6

df <- data.frame(soil = c('mineral', 'peat', 'mineral', 'peat'),
                 use  = c('grassl', 'arable', 'arable', 'grassl'))

df |> 
  mutate(ef1 = ef1[paste(use, soil, sep = "_")],
         ef2 = ef2[paste(use, soil, sep = "_")])
#      soil    use ef1 ef2
# 1 mineral grassl 0.2 2.3
# 2    peat arable 0.5 5.6
# 3 mineral arable 0.4 4.5
# 4    peat grassl 0.3 3.4

相关问题