dplyr通过两个条件进行过滤总结

rmbxnbpk 于 2023-04-18 发布在其他

关注(0)|答案(1)|浏览(115)

我的数据与下面的foo没有什么不同。观察结果按时间顺序排列，并对不同的组进行测量。

library(tidyverse)
n <- 500
r <- 0.001
a <- 5
foo <- data.frame(Time = 1:n) %>%
  mutate(A = a*(1+r)^Time + rnorm(n), B = rnorm(n,mean=a,sd=2)) %>%
  pivot_longer(cols = -Time, names_to = "Group", values_to = "Measurement")

foo %>% ggplot(mapping = aes(x=Time,y=Measurement,color=Group)) +
  geom_line()

我想知道的是，在给定的时间段内（比如在这个例子中〉250），有多少测量值高于或低于整个数据集上一组的给定分位数值。例如，使用整个数据集计算的第10、第50和第90分位数。

fooQuants <- foo %>% group_by(Group) %>%
  reframe(Measurement=quantile(Measurement, probs = c(0.1,0.5,0.9))) %>%
  add_column(Quant = rep(paste0("Q",c(0.1,0.5,0.9)),2))

# eg for group A with Time > 250 this is the number of times that Measurement is above 90th quant
A90th <- fooQuants %>% filter(Group == "A", Quant == "Q0.9") %>% pull(Measurement)
foo %>% filter(Time > 250, Group == "A") %>% summarise(n=sum(Measurement >= A90th))
# eg for group B with Time > 250 this is the number of times that Measurement is above 90th quant
B90th <- fooQuants %>% filter(Group == "B", Quant == "Q0.9") %>% pull(Measurement)
foo %>% filter(Time > 250, Group == "B") %>% summarise(n=sum(Measurement >= B90th))

我想对foo中的所有组和fooQuants中的所有不同分位数水平进行此操作。

#foo %>% filter(Time > 250) %>% select(Group,Measurement) %>%
#  group_by(Group) %>%
#  summarise(`the number of instances where the measurement is gt or lt each quantile in fooQuants`)

感谢小费！

r

来源：https://stackoverflow.com/questions/76017806/dplyr-summarise-by-two-conditions-with-filtering

1条答案

按热度按时间

gojuced71#

library(dplyr) # v1.1.0+ for "join_by"
foo %>%
  left_join(fooQuants, join_by(Group, closest(Measurement >= Measurement))) %>% 
  mutate(Quant = coalesce(Quant, "Q0")) %>%
  filter(Time > 250) %>%
  count(Group, Quant) %>%
  mutate(share = n / sum(n), .by = Group)

结果

# A tibble: 8 × 4
  Group Quant     n share
  <chr> <chr> <int> <dbl>
1 A     Q0        4 0.016
2 A     Q0.1     65 0.26 
3 A     Q0.5    131 0.524
4 A     Q0.9     50 0.2  
5 B     Q0       28 0.112
6 B     Q0.1     92 0.368
7 B     Q0.5    105 0.42 
8 B     Q0.9     25 0.1

赞(0）回复(0）举报 2023-04-18

我来回答

dplyr通过两个条件进行过滤总结

1条答案

相关问题

热门标签

最新问答