如何为R中的结果变量绘制散点图的2D基于分位数的密度?

holgip5t  于 2023-11-14  发布在  其他
关注(0)|答案(1)|浏览(125)

我意识到这个问题在这里已经以类似的方式被问过多次了。我不是要求一个散点图,其中包括数据的密度热图,因为这将 * 两个 * 变量的密度捕获为平滑函数。我正在寻找的是这样的东西,它将结果变量的分布的“切片”覆盖在散点图上:


的数据
我能想到的最好的办法是:

  1. #### Load Library ####
  2. library(tidyverse)
  3. #### Get IQR ####
  4. q <- quantile(iris$Sepal.Length,
  5. probs = c(.25,.5,.75))
  6. q
  7. #### Label Quantile Regions ####
  8. qiris <- iris %>%
  9. mutate(qs = ifelse(Sepal.Length >= q[3],
  10. "Q75",
  11. ifelse(Sepal.Length >= q[2],
  12. "Q50","Q25")))
  13. #### Plot Density and Scatter ####
  14. ggplot()+
  15. geom_point(aes(x=Sepal.Width,
  16. y=Sepal.Length),
  17. data=iris)+
  18. geom_density(aes(y=Sepal.Length,
  19. fill=qs),
  20. data=qiris)

字符串
但可以预见的是,这是失败的,因为它没有将分布的“切片”与预测值相关联。



然后我想出了一个稍微好一点的解决方案,可以正确定位值的分布:

  1. library(ggridges)
  2. ggplot(qiris,
  3. aes(x = Sepal.Length,
  4. y = qs)) +
  5. stat_density_ridges(quantiles = c(0.25,0.5,0.75),
  6. geom="density_ridges_gradient",
  7. jittered_points = TRUE,
  8. position = "raincloud",
  9. alpha = 0.6,
  10. scale = 0.6)+
  11. coord_flip()


这给了我这个:



然而,这里仍然有三个问题。首先,我不能通过它拟合回归线。第二,我希望数据点像普通散点图一样彼此相邻,而不是通过分位数在空间上分开,这样它们就太远了。第三,这根本不包括其他变量,这很重要。

编辑

艾伦的答案一开始看起来不错,但我认为他的代码中有一些我没有看到的东西。为了弄清楚这一点,我尝试使用另一个数据集,并将输入保存为R中的对象,以便更容易交换所有内容。当我这样做时,我在图上得到了平坦的线条。

  1. #### Load Library ####
  2. library(tidyverse)
  3. #### Save Objects ####
  4. dfy <- mtcars$mpg # y var
  5. dfx <- mtcars$hp # x var
  6. data <- mtcars # dataset
  7. #### QDATA ####
  8. qdata <- data %>%
  9. mutate(cut_group = cut(dfy,
  10. quantile(dfy, c(0.125, 0.375, 0.625, 0.875)),
  11. labels = c('Q25', 'Q50', 'Q75')),
  12. baseline = quantile(dfy,
  13. c(0.25, 0.5, 0.75))[as.numeric(cut_group)]) %>%
  14. filter(complete.cases(.)) %>%
  15. group_by(cut_group) %>%
  16. reframe(dfxx = density(dfx)$x,
  17. dfy = first(baseline) - density(dfx, bw = 0.5)$y/3) %>%
  18. rename(dfx = dfxx)
  19. ggplot(data,
  20. aes(dfy,
  21. dfx)) +
  22. geom_smooth(method = 'lm',
  23. color = 'gray',
  24. se = FALSE) +
  25. geom_point(color = 'navy',
  26. shape = 21,
  27. fill = NA) +
  28. geom_path(data = qdata,
  29. aes(group = cut_group),
  30. color = 'darkgreen',
  31. linewidth = 1.5) +
  32. theme_classic() +
  33. theme(panel.border = element_rect(fill = NA,
  34. linewidth = 1))


就像这样:

cuxqih21

cuxqih211#

我可能会通过预先计算分位数的密度并将它们绘制为geom_path来做到这一点:

  1. quartiles <- quantile(iris$Sepal.Width)
  2. midpoints <- quartiles[-5] + 0.5 * diff(quartiles)
  3. qiris <- iris %>%
  4. mutate(Q = cut(Sepal.Width, quartiles, labels = paste0('Q', 1:4)),
  5. baseline = midpoints[as.numeric(Q)]) %>%
  6. filter(complete.cases(.)) %>%
  7. group_by(Q) %>%
  8. reframe(SepalLength = density(Sepal.Length)$x,
  9. Sepal.Width = first(baseline) - density(Sepal.Length, bw = 0.5)$y/3) %>%
  10. rename(Sepal.Length = SepalLength)
  11. ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
  12. annotate('rect', xmin = quartiles[-5], xmax = quartiles[-1], ymin = -Inf,
  13. ymax = Inf, fill = c('gray', NA, 'gray', NA), alpha = 0.2) +
  14. annotate('text', x = midpoints, y = 9, label = paste0('Q', 1:4)) +
  15. geom_smooth(method = 'lm', color = 'gray', se = FALSE) +
  16. geom_point(color = 'navy', shape = 21, fill = NA) +
  17. geom_path(data = qiris, aes(group = Q), color = 'darkgreen',
  18. linewidth = 1.5, alpha = 0.5) +
  19. theme_classic() +
  20. theme(panel.border = element_rect(fill = NA, linewidth = 1))

字符串


的数据
对于mtcars示例,您需要为密度选择不同的带宽和乘数,以使其与现有变量大致相同:

  1. quartiles <- quantile(mtcars$mpg)
  2. midpoints <- quartiles[-5] + 0.5 * diff(quartiles)
  3. qmtcars <- mtcars %>%
  4. mutate(Q = cut(mpg, quartiles, labels = paste0('Q', 1:4)),
  5. baseline = midpoints[as.numeric(Q)]) %>%
  6. filter(complete.cases(.)) %>%
  7. group_by(Q) %>%
  8. reframe(HP = density(hp)$x,
  9. mpg = first(baseline) - density(hp, bw = 100)$y * 500) %>%
  10. rename(hp = HP)
  11. ggplot(mtcars, aes(mpg, hp)) +
  12. annotate('rect', xmin = quartiles[-5], xmax = quartiles[-1], ymin = -Inf,
  13. ymax = Inf, fill = c('gray', NA, 'gray', NA), alpha = 0.2) +
  14. annotate('text', x = midpoints, y = 450, label = paste0('Q', 1:4)) +
  15. geom_smooth(method = 'lm', color = 'gray', se = FALSE) +
  16. geom_point(color = 'navy', shape = 21, fill = NA) +
  17. geom_path(data = qmtcars, aes(group = Q), color = 'darkgreen',
  18. linewidth = 1.5, alpha = 0.5) +
  19. theme_classic() +
  20. theme(panel.border = element_rect(fill = NA, linewidth = 1))


展开查看全部

相关问题