利用dplyr实现分组后的R部分和

3vpjnl9f  于 2023-04-18  发布在  其他
关注(0)|答案(3)|浏览(136)

我试图计算一个总的总和(基于一个变量)的部分总和(基于两个变量)为一个给定的条件在一组由。这是可能的,使用dplyr检索所有的值在同一视图?

输入数据:

view(df %>% 
     group_by(order, type) %>% 
     summarize(total_by_order_type = n(),
               total_by_order = n())
    )
|order|type|total_by_order_type|total_by_order|
|1    |A   | 5                 | 5            |
|1    |B   | 7                 | 7            |
|2    |A   | 2                 | 2            |
|3    |A   | 10                | 10           |
|3    |B   | 6                 | 6            |

需要输出:

我需要的是列“total_by_order”按顺序检索总计,即按顺序检索“total_by_order_type”的总和

|order|type|total_by_order_type|total_by_order|
|1    |A   | 5                 | 12           |
|1    |B   | 7                 | 12           |
|2    |A   | 2                 | 2            |
|3    |A   | 10                | 16           |
|3    |B   | 6                 | 16           |
djmepvbi

djmepvbi1#

谢谢大家。下面的代码是我试图复制的

data <- data.frame(
  "order" = c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3),
  "type" = c("A","A","A","A","A","B","B","B","B","B","B","B","A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B")
)

total_by_order_type <- data %>%
  group_by(order, type) %>%
  summarise(total_by_order_type  = n()) 

total_by_order <- data %>%
  group_by(order) %>%
  summarise(total_by_order  = n()) 

view(
  total_by_order_type %>%
    left_join(total_by_order, by = "order") %>% 
    mutate(percentage_per_order = paste(round(total_by_order_type/total_by_order*100,digits=2),"%",sep=""))
)

输出:

|order|type|total_by_order_type|total_by_order|percentage_per_order|
|1    |A   | 5                 | 12           |41.67%              |
|1    |B   | 7                 | 12           |58.33%              |
|2    |A   | 2                 | 2            |100%                |
|3    |A   | 10                | 16           |62.5%               | 
|3    |B   | 6                 | 16           |37.5%               |
yvfmudvl

yvfmudvl2#

一种简单的方法是分别创建观测计数,然后进行连接。假设您需要的是顺序和类型的观测数,而不是值的总和,因为您在summarise中使用n()

library(dplyr)

data <- data.frame(
  "order" = c(1,1,2,3,3),
  "type" = c("A", "B", "A", "A", "B"),
  "value" = c(7,2,8,3,5)
)

order_type <- data %>%
  group_by(order, type) %>%
  summarise(total_by_order_type  = n()) 

order <- data %>%
  group_by(order) %>%
  summarise(total_by_order  = n()) 

order_type %>%
  left_join(order, by = "order")

如果不需要行计数,而需要列的总和,则将n()交换为sum

order_type <- data %>%
  group_by(order, type) %>%
  summarise(total_by_order_type  = sum(value) )

order <- data %>%
  group_by(order) %>%
  summarise(total_by_order  =  sum(value)) 

order_type %>%
  left_join(order, by = "order")
roejwanj

roejwanj3#

你需要巧妙地使用group_by。首先你按“type”列分组,然后计数,然后在“order”列上再做一次group_by。

df %>% group_by(type) %>% 
mutate(total_by_order_type = n()) %>% group_by(order) %>% 
mutate(total_by_order=n()) %>% ungroup()

相关问题