如何为R中的结果变量绘制散点图的2D基于分位数的密度？

holgip5t 于 2023-11-14 发布在其他

关注(0)|答案(1)|浏览(125)

我意识到这个问题在这里已经以类似的方式被问过多次了。我不是要求一个散点图，其中包括数据的密度热图，因为这将 * 两个 * 变量的密度捕获为平滑函数。我正在寻找的是这样的东西，它将结果变量的分布的“切片”覆盖在散点图上：

的数据
我能想到的最好的办法是：

#### Load Library ####
library(tidyverse)
#### Get IQR ####
q <- quantile(iris$Sepal.Length, 
              probs = c(.25,.5,.75))
q
#### Label Quantile Regions ####
qiris <- iris %>% 
  mutate(qs = ifelse(Sepal.Length >= q[3],
                     "Q75",
                     ifelse(Sepal.Length >= q[2],
                            "Q50","Q25")))
#### Plot Density and Scatter ####
ggplot()+
  geom_point(aes(x=Sepal.Width,
                 y=Sepal.Length),
             data=iris)+
  geom_density(aes(y=Sepal.Length,
                   fill=qs),
               data=qiris)

字符串
但可以预见的是，这是失败的，因为它没有将分布的“切片”与预测值相关联。

的
然后我想出了一个稍微好一点的解决方案，可以正确定位值的分布：

library(ggridges)
ggplot(qiris, 
       aes(x = Sepal.Length,
           y = qs)) + 
  stat_density_ridges(quantiles = c(0.25,0.5,0.75),
                      geom="density_ridges_gradient",
                      jittered_points = TRUE,
                      position = "raincloud", 
                      alpha = 0.6, 
                      scale = 0.6)+
  coord_flip()

型
这给了我这个：

的
然而，这里仍然有三个问题。首先，我不能通过它拟合回归线。第二，我希望数据点像普通散点图一样彼此相邻，而不是通过分位数在空间上分开，这样它们就太远了。第三，这根本不包括其他变量，这很重要。

编辑

艾伦的答案一开始看起来不错，但我认为他的代码中有一些我没有看到的东西。为了弄清楚这一点，我尝试使用另一个数据集，并将输入保存为R中的对象，以便更容易交换所有内容。当我这样做时，我在图上得到了平坦的线条。

#### Load Library ####
library(tidyverse)
#### Save Objects ####
dfy <- mtcars$mpg # y var
dfx <- mtcars$hp # x var
data <- mtcars # dataset
#### QDATA ####
qdata <- data %>% 
  mutate(cut_group = cut(dfy, 
                         quantile(dfy, c(0.125, 0.375, 0.625, 0.875)),
                         labels = c('Q25', 'Q50', 'Q75')),
         baseline = quantile(dfy, 
                             c(0.25, 0.5, 0.75))[as.numeric(cut_group)]) %>%
  filter(complete.cases(.)) %>%
  group_by(cut_group) %>%
  reframe(dfxx = density(dfx)$x,
          dfy = first(baseline) - density(dfx, bw = 0.5)$y/3) %>%
  rename(dfx = dfxx) 
ggplot(data,
       aes(dfy,
           dfx)) +
  geom_smooth(method = 'lm', 
              color = 'gray',
              se = FALSE) +
  geom_point(color = 'navy',
             shape = 21,
             fill = NA) +
  geom_path(data = qdata,
            aes(group = cut_group), 
            color = 'darkgreen',
            linewidth = 1.5) +
  theme_classic() +
  theme(panel.border = element_rect(fill = NA, 
                                    linewidth = 1))

型
就像这样：

r

来源：https://stackoverflow.com/questions/77394284/how-to-draw-2d-quantile-based-densities-of-a-scatterplot-for-the-outcome-variabl

1条答案

按热度按时间

cuxqih211#

我可能会通过预先计算分位数的密度并将它们绘制为geom_path来做到这一点：

quartiles <- quantile(iris$Sepal.Width)
midpoints <- quartiles[-5] + 0.5 * diff(quartiles)
qiris <- iris %>% 
  mutate(Q = cut(Sepal.Width, quartiles, labels = paste0('Q', 1:4)),
         baseline = midpoints[as.numeric(Q)]) %>%
  filter(complete.cases(.)) %>%
  group_by(Q) %>%
  reframe(SepalLength = density(Sepal.Length)$x,
          Sepal.Width = first(baseline) - density(Sepal.Length, bw = 0.5)$y/3) %>%
  rename(Sepal.Length = SepalLength) 
ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
  annotate('rect', xmin = quartiles[-5], xmax = quartiles[-1], ymin = -Inf,
           ymax = Inf, fill = c('gray', NA, 'gray', NA), alpha = 0.2) +
  annotate('text', x = midpoints, y = 9, label = paste0('Q', 1:4)) +
  geom_smooth(method = 'lm', color = 'gray', se = FALSE) +
  geom_point(color = 'navy', shape = 21, fill = NA) +
  geom_path(data = qiris, aes(group = Q), color = 'darkgreen',
            linewidth = 1.5, alpha = 0.5) +
  theme_classic() +
  theme(panel.border = element_rect(fill = NA, linewidth = 1))

字符串

的数据
对于mtcars示例，您需要为密度选择不同的带宽和乘数，以使其与现有变量大致相同：

quartiles <- quantile(mtcars$mpg)
midpoints <- quartiles[-5] + 0.5 * diff(quartiles)
qmtcars <- mtcars %>% 
  mutate(Q = cut(mpg, quartiles, labels = paste0('Q', 1:4)),
         baseline = midpoints[as.numeric(Q)]) %>%
  filter(complete.cases(.)) %>%
  group_by(Q) %>%
  reframe(HP = density(hp)$x,
          mpg = first(baseline) - density(hp, bw = 100)$y * 500) %>%
  rename(hp = HP) 
ggplot(mtcars, aes(mpg, hp)) +
  annotate('rect', xmin = quartiles[-5], xmax = quartiles[-1], ymin = -Inf,
           ymax = Inf, fill = c('gray', NA, 'gray', NA), alpha = 0.2) +
  annotate('text', x = midpoints, y = 450, label = paste0('Q', 1:4)) +
  geom_smooth(method = 'lm', color = 'gray', se = FALSE) +
  geom_point(color = 'navy', shape = 21, fill = NA) +
  geom_path(data = qmtcars, aes(group = Q), color = 'darkgreen',
            linewidth = 1.5, alpha = 0.5) +
  theme_classic() +
  theme(panel.border = element_rect(fill = NA, linewidth = 1))

型

的

展开查看全部

赞(0）回复(0）举报 2023-11-14

我来回答

如何为R中的结果变量绘制散点图的2D基于分位数的密度？

编辑

1条答案

相关问题

热门标签

最新问答