假设我有数据集AB(25 x2)和ABC(30 x3)。如何同时基于数据集AB的变量a*和*b*对ABC中的数据进行过滤?这将导致ABC_filtered?ABC_filtered中的变量n给出了数据集ABC满足变量筛选条件的示例总数a*(应属于AB$a)和***b***(应属于AB$a值的子集中的AB$b)。例如,仅当同时满足以下两个条件时,才对示例进行计数:
a=A AND b=c(1,2,3,4,5) # NOT PART OF CODE
a=B AND b=c(6,7,8,9,10) # NOT PART OF CODE
a=C AND b=c(11,12,13,14,15) # NOT PART OF CODE
a=D AND b=c(16,17,18,19,20) # NOT PART OF CODE
a=E AND b=c(21,22,23,24,25) # NOT PART OF CODE
AB <- data.frame(a=rep(c("A","B","C","D","E"), each=5), b=seq(1:25))
ABC <- structure(list(s = c("s1", "s1", "s1", "s1", "s1", "s1",
"s1", "s1", "s1", "s1", "s2", "s2",
"s2", "s2", "s2", "s2", "s2", "s2",
"s2", "s2", "s3", "s3", "s3", "s3",
"s3", "s3", "s3", "s3", "s3", "s3"),
a = c("D", "H", "H", "F", "F", "H", "C",
"C", "F", "E", "G", "G", "C", "G",
"A", "C", "F", "H", "G", "B", "C",
"G", "C", "F", "A", "G", "E", "G",
"B", "D"),
b = c(3L, 24L, 8L, 23L, 9L, 17L, 14L, 2L,
1L, 2L, 1L, 23L, 19L, 25L, 15L, 19L,
5L, 21L, 13L, 6L, 18L, 23L, 7L, 23L,
17L, 23L, 14L, 15L, 6L, 18L)),
class = "data.frame", row.names = c(NA, -30L))
ABC_filtered <- data.frame(s=c("s1","s2","s3","s3"),
a=c("C","B","B","D"), n=c(1,1,1,1))
我创建筛选器的尝试未按预期工作。
library(magrittr)
ABC %>%
group_by(s, a, b) %>%
dplyr::filter(all(
a %in% AB$a, b %in% AB$b[AB$a])) %>%
count() # DID NOT WORK
有人能帮我正确地编码吗?我将不胜感激。
2条答案
按热度按时间0h4hbjxa1#
您可以使用
semi_join()
,然后使用dplyr
中的count()
。semi_join()
充当筛选连接,使用AB
"筛选"ABC
,count()
执行聚合。创建于2023年2月22日,使用reprex v2.0.2
jv4diomz2#
我试过了
并取得了成果
![1]:https://i.stack.imgur.com/E0NzG.png