我如何在R中按两个变量分组，并使用ggplot将它们按正确的顺序排列？

mtb9vblg 于 2023-02-14 发布在其他

关注(0)|答案(1)|浏览(229)

按两个变量分组和正确的降序有困难。
我使用一个修改过的Iris df，这是我的脚本：

iris_new <- iris %>% 
  mutate(number_petals = sample(1:10, size = 150, replace = TRUE))

iris_new %>% 
  group_by(number_petals, Species) %>%
  summarise(n=sum(Petal.Length, na.rm=TRUE)) %>%
  arrange(desc(n), by_group = TRUE) %>%
  head(25) %>%
  ggplot(aes(x=reorder(number_petals,n),y=n,fill=factor(Species))) +
  xlab("Number of Petals")+
  ylab("Total sum of petal lenghts") +
  geom_col() #+ coord_flip()

有两个问题：
1.自从我添加了第二个group_by参数（Species）后，它就不再按降序排序。

Head（25）不取每个number_petals和每个species的最大25个花瓣长度之和，而是取最大的25个花瓣长度，与number_petals和species无关。
我读到summarise（）删除了第二个group_by变量，但我不确定如何处理该信息。
所有的帮助是非常感谢!

r

来源：https://stackoverflow.com/questions/72950606/how-do-i-group-by-two-variables-in-r-and-arrange-them-in-the-right-order-using-g

1条答案

按热度按时间

7vhp5slm1#

这里有一种使用factor艾德方法的方法-
我们创建了两个新列n和n1，其中n1是每个number_petals中n值的sum。arrange按n1对数据进行排序，并根据其出现情况创建因子水平。

library(dplyr)
library(ggplot2)

iris_new %>% 
  group_by(Species, number_petals) %>%
  summarise(n=sum(Petal.Length, na.rm=TRUE), .groups = "drop") %>%
  group_by(number_petals) %>%
  mutate(n1 = sum(n)) %>%
  arrange(desc(n1)) %>%
  ungroup() %>%
  mutate(number_petals = factor(number_petals, unique(number_petals))) %>%
  ggplot(aes(x=number_petals,y=n,fill=Species)) +
  xlab("Number of Petals")+
  ylab("Total sum of petal lengths") +
  geom_col()

head(25)将选择前25行，而不考虑分组。如果要按组选择顶部行，请查看?slice_max或?slice。
要选择n（这里是5个）值，这里有一个使用连接的不同方法。

iris_new %>% 
  group_by(number_petals) %>%
  summarise(n=sum(Petal.Length, na.rm=TRUE), .groups = "drop") %>%
  slice_max(n, n = 5) %>%
  inner_join(iris_new %>% 
  group_by(Species, number_petals) %>%
  summarise(n1=sum(Petal.Length, na.rm=TRUE), .groups = "drop"), 
            by = 'number_petals') %>%
  arrange(desc(n)) %>%
  mutate(number_petals = factor(number_petals, unique(number_petals))) %>%
  ggplot(aes(x=number_petals,y=n1,fill=Species)) +
  xlab("Number of Petals")+
  ylab("Total sum of petal lengths") +
  geom_col()

赞(0）回复(0）举报 2023-02-14

我来回答

我如何在R中按两个变量分组，并使用ggplot将它们按正确的顺序排列？

1条答案

相关问题

热门标签

最新问答