使用across R将自定义函数应用于多个列

kxxlusnw 于 12个月前发布在其他

关注(0)|答案(1)|浏览(122)

我有一个嵌入式框架，让我们称之为ex_ds，变量为v1到v15

> head(ex_ds)
  v1   v2 v3 v4 v5 v6 v7   v8 v9 v10 v11  v12  v13 v14 v15
1  5 2014  1  2  4  1  1    8  4   2   2    2    2   2   2
2  5 2014  2  6  1  3  8 <NA>  1   1   2    2    2   1   2
3  5 2014  2  5  2  1  1    8  1   1   1 <NA> <NA>   1   2
4  5 2014  2  2  1  4  1    2  5   2   2    2    2   2   2
5  5 2014  1  5  2  1  8    7  4   1   1    2    2   2   2
6  5 2014  2  4  3  5  3    1  3   2   2    2    2   2   2

字符串
我想按每个v1, v2组合进行分组，并计算从v3到v15的每个变量的非NA响应的数量。
下面是我的自定义函数：

n_fn_ex = function(var){
  n_var = ex_ds %>% 
    drop_na(var) %>%
    group_by(v1,v2) %>%
    count() 
}

型
它适用于单独指定的一个变量：

> n_fn_ex("v3")
# A tibble: 1 x 3
# Groups:   v1, v2 [1]
     v1    v2     n
  <int> <int> <int>
1     5  2014     6

> n_fn_ex("v8")
# A tibble: 1 x 3
# Groups:   v1, v2 [1]
     v1    v2     n
  <int> <int> <int>
1     5  2014     5

型
但它在我的交叉语句中不起作用：

ex_ds_1 = ex_ds %>%
  reframe(across(v3:v15, n_fn_ex))

> ex_ds_1 = ex_ds %>%
+   reframe(across(v3:v15, n_fn_ex))
Error in `reframe()`:
i In argument: `across(v3:v15, n_fn_ex)`.
Caused by error in `across()`:
! Can't compute column `v3`.
Caused by error in `drop_na()`:
! Can't subset columns that don't exist.
x Columns `1`, `2`, `2`, `2`, `1`, etc. don't exist.
Run `rlang::last_trace()` to see where the error occurred.

型
我也尝试了以下方法，结果也是同样的错误：

ex_ds_1 = ex_ds %>%
  reframe(across("v3":"v15", n_fn_ex))

型
一样

n_fn_ex_2 = function(var){
  n_var = ex_ds %>% 
    drop_na(var) %>%
    group_by(v1,v2) %>%
    count() 
  return(n_var$n)
}

ex_ds_1 = ex_ds %>%
  mutate(across("v3":"v15", ~n_fn_ex_2(.)))

我跟着davidr穿过页面：https://dplyr.tidyverse.org/reference/across.html和另一个帖子：Apply Multiple Columns to Custom function Using dplyr::mutate(across())
示例数据集：

ex_ds = structure(list(v1 = c(5L, 5L, 5L, 5L, 5L, 5L), v2 = c(2014L, 
2014L, 2014L, 2014L, 2014L, 2014L), v3 = structure(c(1L, 2L, 
2L, 2L, 1L, 2L), .Label = c("1", "2"), class = "factor"), v4 = structure(c(2L, 
6L, 5L, 2L, 5L, 4L), .Label = c("1", "2", "3", "4", "5", "6"), class = "factor"), 
    v5 = structure(c(4L, 1L, 2L, 1L, 2L, 3L), .Label = c("1", 
    "2", "3", "4"), class = "factor"), v6 = structure(c(1L, 3L, 
    1L, 4L, 1L, 5L), .Label = c("1", "2", "3", "4", "5", "6"), class = "factor"), 
    v7 = structure(c(1L, 8L, 1L, 1L, 8L, 3L), .Label = c("1", 
    "2", "3", "4", "5", "6", "7", "8"), class = "factor"), v8 = structure(c(8L, 
    NA, 8L, 2L, 7L, 1L), .Label = c("1", "2", "3", "4", "5", 
    "6", "7", "8"), class = "factor"), v9 = structure(c(4L, 1L, 
    1L, 5L, 4L, 3L), .Label = c("1", "2", "3", "4", "5"), class = "factor"), 
    v10 = structure(c(2L, 1L, 1L, 2L, 1L, 2L), .Label = c("1", 
    "2"), class = "factor"), v11 = structure(c(2L, 2L, 1L, 2L, 
    1L, 2L), .Label = c("1", "2"), class = "factor"), v12 = structure(c(2L, 
    2L, NA, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), 
    v13 = structure(c(2L, 2L, NA, 2L, 2L, 2L), .Label = c("1", 
    "2"), class = "factor"), v14 = structure(c(2L, 1L, 1L, 2L, 
    2L, 2L), .Label = c("1", "2"), class = "factor"), v15 = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor")), row.names = c(NA, 
6L), class = "data.frame")

r

来源：https://stackoverflow.com/questions/77681850/applying-custom-function-to-many-columns-in-dataframe-using-across-r

1条答案

按热度按时间

szqfcxe21#

如果你不一定需要为此创建一个函数，你可以使用以下命令获得想要的结果：

ex_ds %>%
  group_by(v1, v2) %>%
  summarise_at(vars(v3:v15), ~ sum(!is.na(.x)))

字符串

更新：

正如@Onyambu评论的那样，summarise_at(vars(...))已经被弃用，而支持summarise(across(...))。

ex_ds %>%
  group_by(v1, v2) %>%
  summarise(across(v3:v15, ~ sum(!is.na(.x))))

型

赞(0）回复(0）举报 12个月前

我来回答

使用across R将自定义函数应用于多个列

1条答案

更新：

相关问题

热门标签

最新问答