按字符列名过滤数据框(dplyr格式)

bwleehnv 于 2023-01-15 发布在其他

关注(0)|答案(8)|浏览(149)

我有一个数据框，想用两种方式之一过滤它，列“this”或列“that”。我希望能够引用列名作为变量。如何（在dplyr中，如果有区别的话）通过变量引用列名？

library(dplyr)
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
df
#   this that
# 1    1    1
# 2    2    1
# 3    2    2
df %>% filter(this == 1)
#   this that
# 1    1    1

但是假设我想使用变量column来保存“this”或“that”，并过滤column的值，as.symbol和get在其他上下文中都可以工作，但在下面的上下文中不行：

column <- "this"
df %>% filter(as.symbol(column) == 1)
# [1] this that
# <0 rows> (or 0-length row.names)
df %>% filter(get(column) == 1)
# Error in get("this") : object 'this' not found

如何将column的值转换为列名？

来源：https://stackoverflow.com/questions/27197617/filter-data-frame-by-character-column-name-in-dplyr

8条答案

按热度按时间

bybem2ql1#

使用rlang的injection范例

从current dplyr documentation（我着重强调）：
dplyr曾经提供两个版本的后缀为下划线的动词，这两个版本有标准求值（SE）语义：它们不像NSE动词那样通过代码获取参数，而是通过值获取参数。它们的目的是使dplyr编程成为可能。然而，dplyr现在使用了整洁的求值语义。NSE动词仍然捕获它们的参数，但是**你现在可以取消引用这些参数的一部分。这为NSE动词提供了完全的可编程性。**因此，下划线的版本现在是多余的。
因此，本质上我们需要执行两个步骤才能在dplyr::filter()中引用变量column的值"this"：
1.我们需要将字符类型的变量column转换为symbol类型。
使用基数R，这可以通过as.name()的别名as.symbol()函数来实现，前者是tidyverse开发人员的首选，因为它
采用更现代的术语（R型而不是S模式）。
或者，通过tidyverse的rlang::sym()也可以实现相同的效果。
1.我们需要将1）中的符号注入dplyr::filter()表达式。
这是通过所谓的injection operator**!!**来完成的，它基本上是syntactic sugar，允许在R计算代码之前修改它。
(In dplyr的早期版本（或相应的底层rlang）曾经存在!!与单个!冲突的情况（包括您的情况），但这不再是问题，因为!!获得了正确的操作符优先级。）
应用于您的示例：

library(dplyr)
df <- data.frame(this = c(1, 2, 2),
                 that = c(1, 1, 2))
column <- "this"

df %>% filter(!!as.symbol(column) == 1)
#   this that
# 1    1    1

使用替代解决方案

在dplyr::filter() * 中引用变量column的值"this"的其他方法不依赖于rlang的注入范例 *，包括：

通过 * 整理选择 * 范例，即dplyr::if_any() / dplyr::if_all()与tidyselect::all_of()

df %>% filter(if_any(.cols = all_of(column),
                     .fns = ~ .x == 1))

通过rlang's .data pronoun和base R's [[：

df %>% filter(.data[[column]] == 1)

通过magrittr的.参数占位符和base R's [[：

df %>% filter(.[[column]] == 1)

赞(0）回复(0）举报 2023-01-15

tzcvj98z2#

我会避免同时使用get()。在这种情况下似乎会相当危险，特别是在编程时。您可以使用未求值的调用或粘贴的字符串，但您需要使用filter_()而不是filter()。

df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
column <- "this"

选项1-使用未评估的调用：

您可以将y硬编码为1，但这里我将其显示为y，以说明如何轻松地更改表达式值。

expr <- lazyeval::interp(quote(x == y), x = as.name(column), y = 1)
## or 
## expr <- substitute(x == y, list(x = as.name(column), y = 1))
df %>% filter_(expr)
#   this that
# 1    1    1

选项2-使用paste()（显然更容易）：

df %>% filter_(paste(column, "==", 1))
#   this that
# 1    1    1

这两个选项的主要问题是我们需要使用filter_()而不是filter()，事实上，从我所读到的，如果你用dplyr编程，你应该总是使用*_()函数。
我把这篇文章作为一个有用的参考：character string as function argument r，我使用的是dplyr版本0.3.0.2。

赞(0）回复(0）举报 2023-01-15

rjzwgtxy3#

下面是dplyr最新版本的另一个解决方案：

df <- data.frame(this = c(1, 2, 2),
                 that = c(1, 1, 2))
column <- "this"

df %>% filter(.[[column]] == 1)

#  this that
#1    1    1

赞(0）回复(0）举报 2023-01-15

2mbi3lxu4#

关于Richard的解决方案，我只想补充一点，如果你的列是字符型的，你可以添加shQuote来过滤字符值。
例如，您可以使用

df %>% filter_(paste(column, "==", shQuote("a")))

如果有多个过滤器，则可以在paste中指定collapse = "&"。

df %>$ filter_(paste(c("column1","column2"), "==", shQuote(c("a","b")), collapse = "&"))

赞(0）回复(0）举报 2023-01-15

ttp71kqs5#

最新的方法是使用my.data.frame %>% filter(.data[[myName]] == 1)，其中myName是包含列名的环境变量。

赞(0）回复(0）举报 2023-01-15

u4vypkhs6#

或者使用filter_at

library(dplyr)
df %>% 
   filter_at(vars(column), any_vars(. == 1))

赞(0）回复(0）举报 2023-01-15

cclgggtu7#

像萨利姆B上面解释，但有一个小的变化：

df %>% filter(1 == !!as.name(column))

即，只是颠倒条件，因为!!否则表现为

!!(as.name(column)==1)

赞(0）回复(0）举报 2023-01-15

ej83mcc08#

可以使用across（all_of（））语法，它将字符串作为参数

column = "this"
df %>% filter(across(all_of(column)) == 1)

赞(0）回复(0）举报 2023-01-15

我来回答

按字符列名过滤数据框(dplyr格式)

8条答案

使用rlang的injection范例

使用替代解决方案

相关问题

热门标签

最新问答