使用dplyr在mutate()中动态引用列

l3zydbqr  于 2023-07-31  发布在  其他
关注(0)|答案(3)|浏览(79)

我尝试使用summarise()基于动态列名创建列。我发现我可以很容易地用粘合语法"{}":=创建动态名称,但是我不知道如何在另一个mutate()函数中引用这些列。
根据我在网上读到的提示,常见的解决方案是使用{{varname}}或将varname_enq <- enquo(varname)!!varname_enq一起使用。不幸的是,这两种方法对我都不起作用。
到目前为止,我已经看了其他SO职位以及Programming with dplyr guide。我非常感谢你能给予我的所有建议!
下面是一个小例子,突出了这个问题。

This is the goal:

# A tibble: 3 × 3
  Species    species_sum cumulative_sum
  <fct>            <dbl>          <dbl>
1 setosa            250.           250.
2 versicolor        297.           547.
3 virginica         329.           876.
mycol <- "species_sum"
mycol_enquo <- enquo(mycol)
myothercol <- "cumulative_sum"

# this works, but the cumulative sum isn't dynamic
iris %>% 
  group_by(Species) %>% 
  summarise("{mycol}" := sum(Sepal.Length)) %>% 
  ungroup() %>%
  mutate(cumulative_sum = cumsum(species_sum))

# this works, but the cumsum function still uses a fixed variable name
iris %>% 
  group_by(Species) %>% 
  summarise("{mycol}" := sum(Sepal.Length)) %>% 
  ungroup() %>%
  mutate("{myothercol}" := cumsum(species_sum))

# doesn't work, the new column is all NA
iris %>% 
  group_by(Species) %>% 
  summarise("{mycol}" := sum(Sepal.Length)) %>% 
  ungroup() %>%
  mutate("{myothercol}" := cumsum( "{mycol}" ))
  
# doesn't work, the new column is all NA
iris %>% 
  group_by(Species) %>% 
  summarise("{mycol}" := sum(Sepal.Length)) %>% 
  ungroup() %>%
  mutate("{myothercol}" := cumsum( {{mycol}} ))

# doesn't work, the new column is all NA
iris %>% 
  group_by(Species) %>% 
  summarise("{mycol}" := sum(Sepal.Length)) %>% 
  ungroup() %>%
  mutate("{myothercol}" := cumsum( !!mycol_enquo ))
okxuctiv

okxuctiv1#

library(tidyverse)

myFunction <- function(df, col1, col2) {
  df %>% 
    group_by(Species) %>% 
    summarise({{ col1 }} := sum(Sepal.Length))  %>% 
     ungroup() %>%
     mutate({{ col2 }} := cumsum({{ col1 }}))
}

iris %>% myFunction(species_sum, cumulative_sum)
# A tibble: 3 × 3
  Species    species_sum cumulative_sum
  <fct>            <dbl>          <dbl>
1 setosa            250.           250.
2 versicolor        297.           547.
3 virginica         329.           876.

字符串

kokeuurv

kokeuurv2#

这样行吗?
要用字符向量指定变量,.data是您的朋友。

iris %>% 
  group_by(Species) %>% 
  summarise(!!mycol := sum(Sepal.Length)) %>% 
  ungroup() %>%
  mutate(!!myothercol := cumsum(.data[[mycol]]))
# A tibble: 3 x 3
  Species    species_sum cumulative_sum
  <fct>            <dbl>          <dbl>
1 setosa            250.           250.
2 versicolor        297.           547.
3 virginica         329.           876.

字符串
或者,您可以使用across()

iris %>% 
  group_by(Species) %>% 
  summarise(across(Sepal.Length, sum, .names = mycol)) %>%
  ungroup() %>%
  mutate(across(all_of(mycol), cumsum, .names = myothercol))
# A tibble: 3 x 3
  Species    species_sum cumulative_sum
  <fct>            <dbl>          <dbl>
1 setosa            250.           250.
2 versicolor        297.           547.
3 virginica         329.           876

68bkxrlz

68bkxrlz3#

另一种选择是使用sym()将字符串转换为符号,然后我们使用!!取消引号。注意,与{{相反,我们必须使用"

library(dplyr)
library(rlang)

my_function <- function(df, mycol, myothercol) {
  df %>% 
  summarise(!!mycol := sum(Sepal.Length), .by = Species) %>% 
  mutate(!!myothercol := cumsum(!!sym(mycol)))
}

iris %>% 
  my_function("species_sum", "cumulative_sum")
Species species_sum cumulative_sum
1     setosa       250.3          250.3
2 versicolor       296.8          547.1
3  virginica       329.4          876.5

相关问题