我有一个嵌入式框架,让我们称之为ex_ds
,变量为v1
到v15
> head(ex_ds)
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15
1 5 2014 1 2 4 1 1 8 4 2 2 2 2 2 2
2 5 2014 2 6 1 3 8 <NA> 1 1 2 2 2 1 2
3 5 2014 2 5 2 1 1 8 1 1 1 <NA> <NA> 1 2
4 5 2014 2 2 1 4 1 2 5 2 2 2 2 2 2
5 5 2014 1 5 2 1 8 7 4 1 1 2 2 2 2
6 5 2014 2 4 3 5 3 1 3 2 2 2 2 2 2
字符串
我想按每个v1, v2
组合进行分组,并计算从v3
到v15
的每个变量的非NA响应的数量。
下面是我的自定义函数:
n_fn_ex = function(var){
n_var = ex_ds %>%
drop_na(var) %>%
group_by(v1,v2) %>%
count()
}
型
它适用于单独指定的一个变量:
> n_fn_ex("v3")
# A tibble: 1 x 3
# Groups: v1, v2 [1]
v1 v2 n
<int> <int> <int>
1 5 2014 6
> n_fn_ex("v8")
# A tibble: 1 x 3
# Groups: v1, v2 [1]
v1 v2 n
<int> <int> <int>
1 5 2014 5
型
但它在我的交叉语句中不起作用:
ex_ds_1 = ex_ds %>%
reframe(across(v3:v15, n_fn_ex))
> ex_ds_1 = ex_ds %>%
+ reframe(across(v3:v15, n_fn_ex))
Error in `reframe()`:
i In argument: `across(v3:v15, n_fn_ex)`.
Caused by error in `across()`:
! Can't compute column `v3`.
Caused by error in `drop_na()`:
! Can't subset columns that don't exist.
x Columns `1`, `2`, `2`, `2`, `1`, etc. don't exist.
Run `rlang::last_trace()` to see where the error occurred.
型
我也尝试了以下方法,结果也是同样的错误:
ex_ds_1 = ex_ds %>%
reframe(across("v3":"v15", n_fn_ex))
型
一样
n_fn_ex_2 = function(var){
n_var = ex_ds %>%
drop_na(var) %>%
group_by(v1,v2) %>%
count()
return(n_var$n)
}
ex_ds_1 = ex_ds %>%
mutate(across("v3":"v15", ~n_fn_ex_2(.)))
我跟着davidr穿过页面:https://dplyr.tidyverse.org/reference/across.html和另一个帖子:Apply Multiple Columns to Custom function Using dplyr::mutate(across())
示例数据集:
ex_ds = structure(list(v1 = c(5L, 5L, 5L, 5L, 5L, 5L), v2 = c(2014L,
2014L, 2014L, 2014L, 2014L, 2014L), v3 = structure(c(1L, 2L,
2L, 2L, 1L, 2L), .Label = c("1", "2"), class = "factor"), v4 = structure(c(2L,
6L, 5L, 2L, 5L, 4L), .Label = c("1", "2", "3", "4", "5", "6"), class = "factor"),
v5 = structure(c(4L, 1L, 2L, 1L, 2L, 3L), .Label = c("1",
"2", "3", "4"), class = "factor"), v6 = structure(c(1L, 3L,
1L, 4L, 1L, 5L), .Label = c("1", "2", "3", "4", "5", "6"), class = "factor"),
v7 = structure(c(1L, 8L, 1L, 1L, 8L, 3L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8"), class = "factor"), v8 = structure(c(8L,
NA, 8L, 2L, 7L, 1L), .Label = c("1", "2", "3", "4", "5",
"6", "7", "8"), class = "factor"), v9 = structure(c(4L, 1L,
1L, 5L, 4L, 3L), .Label = c("1", "2", "3", "4", "5"), class = "factor"),
v10 = structure(c(2L, 1L, 1L, 2L, 1L, 2L), .Label = c("1",
"2"), class = "factor"), v11 = structure(c(2L, 2L, 1L, 2L,
1L, 2L), .Label = c("1", "2"), class = "factor"), v12 = structure(c(2L,
2L, NA, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"),
v13 = structure(c(2L, 2L, NA, 2L, 2L, 2L), .Label = c("1",
"2"), class = "factor"), v14 = structure(c(2L, 1L, 1L, 2L,
2L, 2L), .Label = c("1", "2"), class = "factor"), v15 = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor")), row.names = c(NA,
6L), class = "data.frame")
1条答案
按热度按时间szqfcxe21#
如果你不一定需要为此创建一个函数,你可以使用以下命令获得想要的结果:
字符串
更新:
正如@Onyambu评论的那样,
summarise_at(vars(...))
已经被弃用,而支持summarise(across(...))
。型