R语言 为什么相同的ggplot2代码会创建具有不同标签的Plot?

vktxenjb  于 2023-04-18  发布在  其他
关注(0)|答案(1)|浏览(172)

我是R编程的新手,刚刚在Rstudio上完成了我的第一个数据分析项目。这是我关于堆栈溢出的第一个问题,我不确定这些细节是否足够。如果有更有经验的人指导,我很乐意编辑。我是否应该更改 Dataframe 和/或变量的名称以使其更容易?
问题:我用下面的代码创建了一个图表,其中的%标签反映了两周前会员和休闲会员的情况。
条形图,显示了成员与非成员的%值:

令人惊讶的是,现在使用相同的代码创建的图中,%值是月份而不是成员/临时。
按月显示%值的条形图:

Divvy22_clean %>% 
  group_by(start_mth, member_casual) %>% 
  dplyr::summarise(count = n()) %>% 
  mutate(percent = count/sum(count)*100) %>% 
  ggplot(aes(start_mth, count, fill = member_casual)) + 
  geom_col(position = "stack") + 
  geom_text(aes(label = paste0(round(percent), "%")), 
            position = position_stack(vjust = 0.5), size = 3, color = "white") +
  scale_y_continuous(limits = c(0, 900000), expand = c(0, 0)) + 
  labs(x = "Months, 2022", y = "Total Rentals", 
       title = "Ride Distribution by Month and Membership Type")+
  facet_wrap(~member_casual)

如果你需要模拟这个问题,我希望下面的代码可以帮助你:

set.seed(123) 

start_mth <- rep(c("Jan", "Feb", "Mar"), each = 2)
member_casual <- rep(c("member", "casual"), times = 3)
count <- round(runif(6, 0, 900000))
Divvy_example <- data.frame(start_mth, member_casual, count)

# Calculate percentages
Divvy_example <- Divvy_example %>% 
  group_by(start_mth, member_casual) %>% 
  mutate(percent = count/sum(count) * 100)

# Plot 
ggplot(Divvy_example, aes(start_mth, count, fill = member_casual)) + 
  geom_col(position = "stack") + 
  geom_text(aes(label = paste0(round(percent), "%")), 
            position = position_stack(vjust = 0.5), size = 3, color = "white") + 
  scale_y_continuous(limits = c(0, 900000), expand = c(0, 0)) + 
  labs(x = "Months, 2022", y = "Total Rentals", 
       title = "Ride Distribution by Month and Membership Type") +
  facet_wrap(~member_casual)

另外,我在两天前将R更新到了最新版本。这是我所做的唯一更改。另外,另一个可能相关的细节是,当这段代码在RStudio中创建第一个带有成员/临时值的图时,它正在Kaggle上创建按月%值的图,就像它现在在更新的R Studio上所做的一样。
我正在努力学习更多关于R的知识。如果能知道是什么导致了这种不一致,以及如何获得会员/临时会员%值的图表,这将非常有帮助。感谢您的帮助!
我试图理解这种不一致背后的原因,因为这段代码在过去的一个月里已经工作了好几次。语法是否会受到R的更新版本的影响,如果发生这种情况,有什么方法可以避免它?或者我遇到这种不一致的原因是什么?

kmbjn2e3

kmbjn2e31#

这里有一些模拟数据显示了这个问题(问题是在summarise部分,它创建了count,所以在计数之前从一些新数据开始)。
在复制第二个错误计算的图表的顶部图表中,.groups = "drop"参数然后具有未分组的输出。第二个带有.groups = "drop_last"的图表删除member_casual分组,然后按start_mth计算百分比,我认为这是您打算做的?这些默认值可能在运行两个图表之间发生了变化,因此,如果您明确声明.groups = "drop_last",则这将保持一致。

library(tidyverse)

set.seed(123)

df <-
  tibble(
    start_mth = sample(
      month.abb,
      size = 1000,
      replace = TRUE,
      prob = c(1:6, 6:1)
    ),
    member_casual = sample(
      c("member", "casual"),
      1000,
      replace = TRUE
    )
  ) %>% 
  mutate(start_mth = factor(start_mth, levels = month.abb))

df %>% 
group_by(start_mth, member_casual) %>% 
  summarise(count = n(), .groups = "drop") %>% 
  mutate(percent = count/sum(count)*100) %>% 
  ggplot(aes(start_mth, count, fill = member_casual)) + 
  geom_col(position = "stack") + 
  geom_text(aes(label = paste0(round(percent), "%")), 
            position = position_stack(vjust = 0.5), size = 3, color = "white") +
  scale_y_continuous(expand = expansion(add = c(0, 10))) + 
  labs(x = "Months, 2022", y = "Total Rentals", 
       title = "Ride Distribution by Month and Membership Type")+
  facet_wrap(~member_casual)

df %>% 
  group_by(start_mth, member_casual) %>% 
  summarise(count = n(), .groups = "drop_last") %>% 
  mutate(percent = count/sum(count)*100) %>% 
  ggplot(aes(start_mth, count, fill = member_casual)) + 
  geom_col(position = "stack") + 
  geom_text(aes(label = paste0(round(percent), "%")), 
            position = position_stack(vjust = 0.5), size = 3, color = "white") +
  scale_y_continuous(expand = expansion(add = c(0, 10))) + 
  labs(x = "Months, 2022", y = "Total Rentals", 
       title = "Ride Distribution by Month and Membership Type") +
  facet_wrap(~member_casual)

额外提示-让ggplot帮你计算

借助一些额外的ggplot2函数,您可以删除group_bysummarisemutate行,并将未更改的 Dataframe 直接传递给ggplot

  • 第一个stat_count将按类别和返回条计数观测值
  • 第二个stat_count通过计算每个x值的比例来构造标签:sapply(x, FUN = function(month) ...)x轴上计算每个月的值

一般来说,最好的做法是将未经修改的、整洁的data.frame放入ggplot中,让它为您工作,但是如果您已经理解了可以工作的代码,那么这一点就不那么重要了!

df %>%
  ggplot(aes(start_mth, fill = member_casual)) +
  stat_count() +
  stat_count(aes(label = paste0(round(
                   after_stat(count) / 
                     sapply(x, FUN = function(month)
                       sum(count[x == month])
                   ) * 100
                 ), "%")),
             geom = "text",
             position = position_stack(vjust = 0.5),
             size = 3, color = "white") +
  scale_y_continuous(expand = expansion(add = c(0, 10))) + 
  labs(x = "Months, 2022", y = "Total Rentals", 
       title = "Ride Distribution by Month and Membership Type") +
  facet_wrap( ~ member_casual)

示例代码中的编辑-分组

要注意的关键是,当按月计算%s时,分组应该start_mth进行。在生成计数和百分比之间的group_by中说明这一点将在所有情况下解决此问题:

library(tidyverse)
set.seed(123) 

start_mth <- rep(c("Jan", "Feb", "Mar"), each = 2)
member_casual <- rep(c("member", "casual"), times = 3)
count <- round(runif(6, 0, 900000))
Divvy_example <- data.frame(start_mth, member_casual, count)

# Calculate percentages
Divvy_example <- Divvy_example %>% 
  group_by(start_mth) %>%  # Group **only** by start_mth
  mutate(percent = count/sum(count) * 100)

# Plot 
ggplot(Divvy_example, aes(start_mth, count, fill = member_casual)) + 
  geom_col(position = "stack") + 
  geom_text(aes(label = paste0(round(percent), "%")), 
            position = position_stack(vjust = 0.5), size = 3, color = "white") + 
  scale_y_continuous(limits = c(0, 900000), expand = c(0, 0)) + 
  labs(x = "Months, 2022", y = "Total Rentals", 
       title = "Ride Distribution by Month and Membership Type") +
  facet_wrap(~member_casual)

相关问题