如何在R中创建一个sankey图,显示同一个节点随时间的变化?

v6ylcynt  于 2023-05-20  发布在  其他
关注(0)|答案(1)|浏览(169)

我试图创建一个桑基图表我的数据。
对于每种疗法,随时间对个体进行随访。我希望有一个节点“治疗”(具有不同治疗名称的分类变量)随时间重复,x轴说明时间。有什么想法吗?我真的很感激任何帮助。
这就是我到目前为止所尝试的:

### install and load packages
install.packages("ggplot2")
install.packages("readxl")
install.packages("ggforce")

# load packages
library(ggplot2)
library(readxl)
library(ggforce)

### read dataset
dataset_new <- read_excel("Made_up_dataset_new.xlsx")
df_new <- as.data.frame(dataset_new)

df_new$Unit <- 1

df_sankey <- df_new[c("Therapy", "Frequency", "Continuous_time","Unit")]

# transform dataframe into appropriate format
df_sankey <- gather_set_data(df_sankey, 1:3)

# define axis-width / sep parameters once here, to be used by each geom layer in the plot
aw <- 0.1
sp <- 0.1

ggplot(df_sankey, 
       aes(x = x, id = id, split = y, value = Unit)) +
  geom_parallel_sets(aes(fill = Therapy), alpha = 0.3, 
                     axis.width = aw, sep = sp) +
  geom_parallel_sets_axes(axis.width = aw, sep = sp) +
  geom_parallel_sets_labels(colour = "white", 
                            angle = 0, size = 3,
                            axis.width = aw, sep = sp) +
  theme_minimal()

但结果并不是我想要的,因为时间被压缩在y轴上,而不是x轴上,如果这有意义的话?
我感谢任何帮助!

qgelzfjb

qgelzfjb1#

你有几个选择我的第一个解决方案是ggplot/geom_flow

# requires(ggplot2)
# requires(ggalluvial)

# faking the data for 20 patients
set.seed(42)
individual <- as.character(rep(1:20,each=5))
timeperiod <- paste0(rep(c(0, 18,36,54,72),20),"_week")
therapy <- factor(sample(c("Etanercept", "Infliximab", "Rituximab",  "Adalimumab","Missing"), 100, replace=T))
d <- data.frame(individual, timeperiod, therapy)
head(d)

# Plotting it
ggplot(d, aes(x = timeperiod, stratum = therapy, alluvium = individual, fill = therapy, label = therapy)) +
  scale_fill_brewer(type = "qual", palette = "Set2") +
  geom_flow(stat = "alluvium", lode.guidance = "rightleft", color = "darkgray") +
  geom_stratum() +
  theme(legend.position = "bottom") +
  ggtitle("Treatment across observation period")

geom_flow中的参数stat = "alluvium"应该允许跟踪单个患者,但如果需要,也可以合并流:

ggplot(d, aes(x = timeperiod, stratum = therapy, alluvium = individual, fill = therapy, label = therapy)) +
  scale_fill_brewer(type = "qual", palette = "Set2") +
  geom_flow(color = "darkgray") +
  geom_stratum() +
  theme(legend.position = "bottom") +
  ggtitle("Treatment across observation period")

编辑1:如果您希望某些患者的血流中断(例如:治疗已经完成),您可以通过将这些患者设置为NA轻松地完成此操作:

# setting 3 pantients as NA for the last timepoint
d[which(d$individual==3 & d$timeperiod=="72_week"), ]["therapy"] <- NA 
d[which(d$individual==6 & d$timeperiod=="72_week"), ]["therapy"] <- NA 
d[which(d$individual==9 & d$timeperiod=="72_week"), ]["therapy"] <- NA 

# making the plot:
ggplot(d, aes(x = timeperiod, stratum = therapy, alluvium = individual, fill = therapy, label = therapy)) +
scale_fill_brewer(type = "qual", palette = "Set2") +
geom_flow(stat = "alluvium", lode.guidance = "rightleft", color = "darkgray") + 
geom_stratum(alpha=0.75) +
theme(legend.position = "bottom") +
ggtitle("Treatment across observation period")


现在,老实说,networkD3也工作,但我只是没有设法使它看起来足够好。

编辑2

  • 您也可以使用geom_alluvium代替geom_flow。它们之间的主要(视觉)差异是,在geom_flow中,流的颜色是从相邻节点(源节点或目标节点)继承的。在geom_alluvium中,它是从第一个节点继承的-例如flow * 在通过节点时不会 * 改变颜色。
  • 如果您想将图表与另一个图组合,最简单的方法似乎是使用par(mfrow=c(1,2))

相关问题