使用dplyr的条件过滤

pieyvz9o  于 2023-02-26  发布在  其他
关注(0)|答案(2)|浏览(211)

假设我有数据集AB(25 x2)和ABC(30 x3)。如何同时基于数据集AB的变量a*和*b*ABC中的数据进行过滤?这将导致ABC_filteredABC_filtered中的变量n给出了数据集ABC满足变量筛选条件的示例总数a*(应属于AB$a)和***b***(应属于AB$a值的子集中的AB$b)。例如,仅当同时满足以下两个条件时,才对示例进行计数:

a=A AND b=c(1,2,3,4,5) # NOT PART OF CODE
a=B AND b=c(6,7,8,9,10) # NOT PART OF CODE
a=C AND b=c(11,12,13,14,15) # NOT PART OF CODE
a=D AND b=c(16,17,18,19,20) # NOT PART OF CODE
a=E AND b=c(21,22,23,24,25) # NOT PART OF CODE

AB <- data.frame(a=rep(c("A","B","C","D","E"), each=5), b=seq(1:25))
ABC <- structure(list(s = c("s1", "s1", "s1", "s1", "s1", "s1", 
                            "s1", "s1", "s1", "s1", "s2", "s2", 
                            "s2", "s2", "s2", "s2", "s2", "s2", 
                            "s2", "s2", "s3", "s3", "s3", "s3", 
                            "s3", "s3", "s3", "s3", "s3", "s3"), 
                      a = c("D", "H", "H", "F", "F", "H", "C", 
                            "C", "F", "E", "G", "G", "C", "G", 
                            "A", "C", "F", "H", "G", "B", "C", 
                            "G", "C", "F", "A", "G", "E", "G", 
                            "B", "D"), 
                      b = c(3L, 24L, 8L, 23L, 9L, 17L, 14L, 2L, 
                            1L, 2L, 1L, 23L, 19L, 25L, 15L, 19L, 
                            5L, 21L, 13L, 6L, 18L, 23L, 7L, 23L, 
                            17L, 23L, 14L, 15L, 6L, 18L)), 
                 class = "data.frame", row.names = c(NA, -30L))
ABC_filtered <- data.frame(s=c("s1","s2","s3","s3"), 
                           a=c("C","B","B","D"), n=c(1,1,1,1))

我创建筛选器的尝试未按预期工作。

library(magrittr)
ABC %>% 
  group_by(s, a, b) %>% 
  dplyr::filter(all(
    a %in% AB$a, b %in% AB$b[AB$a])) %>% 
  count() # DID NOT WORK

有人能帮我正确地编码吗?我将不胜感激。

0h4hbjxa

0h4hbjxa1#

您可以使用semi_join(),然后使用dplyr中的count()semi_join()充当筛选连接,使用AB "筛选" ABCcount()执行聚合。

AB <- data.frame(a=rep(c("A","B","C","D","E"), each=5), b=seq(1:25))
ABC <- structure(list(s = c("s1", "s1", "s1", "s1", "s1", "s1", 
                            "s1", "s1", "s1", "s1", "s2", "s2", 
                            "s2", "s2", "s2", "s2", "s2", "s2", 
                            "s2", "s2", "s3", "s3", "s3", "s3", 
                            "s3", "s3", "s3", "s3", "s3", "s3"), 
                      a = c("D", "H", "H", "F", "F", "H", "C", 
                            "C", "F", "E", "G", "G", "C", "G", 
                            "A", "C", "F", "H", "G", "B", "C", 
                            "G", "C", "F", "A", "G", "E", "G", 
                            "B", "D"), 
                      b = c(3L, 24L, 8L, 23L, 9L, 17L, 14L, 2L, 
                            1L, 2L, 1L, 23L, 19L, 25L, 15L, 19L, 
                            5L, 21L, 13L, 6L, 18L, 23L, 7L, 23L, 
                            17L, 23L, 14L, 15L, 6L, 18L)), 
                 class = "data.frame", row.names = c(NA, -30L))
ABC_filtered <- data.frame(s=c("s1","s2","s3","s3"), 
                           a=c("C","B","B","D"), n=c(1,1,1,1))

library(dplyr)
semi_join(x = ABC, y = AB, by = c("a", "b")) %>%
  count(s, a)
#>    s a n
#> 1 s1 C 1
#> 2 s2 B 1
#> 3 s3 B 1
#> 4 s3 D 1

创建于2023年2月22日,使用reprex v2.0.2

jv4diomz

jv4diomz2#

我试过了

ABC_answer <- merge(AB, ABC, all.x = TRUE) %>% tidyr::drop_na()

并取得了成果
![1]:https://i.stack.imgur.com/E0NzG.png

相关问题