R语言 各医院重叠住院时段如何提取?

db2dz4w8  于 2023-02-20  发布在  其他
关注(0)|答案(2)|浏览(148)

df中有五个变量:studyid、hospitalname、Date1、Date2和group。对于df中hospitalname中的每个名称,我希望提取所有组合,其中从Date1到Date2的时间段在具有group 0和group 1的组之间重叠。

library(zoo)

# create example data
df <- data.frame(
  studyid = 1:5,
  Date1 = as.yearmon(c("2020-01", "2020-03", "2020-10", "2020-07", "2020-06")),
  Date2 = as.yearmon(c("2020-02", "2020-10", "2021-02", "2020-08", "2020-10")),
  hospitalname = c("Hospital A", "Hospital A", "Hospital A", "Hospital B", "Hospital B"),
  group = c(0, 1, 0, 1, 0)
)

经过分析,我希望得到这样的结果

result <- data.frame(
  studyid.0 = c("3","5"),
  Date1_0 = as.yearmon(c("2020-10", "2020-06")),
  Date2_0 = as.yearmon(c("2021-02", "2020-10")),
  studyid.1 = c("2","4"),
  Date1_1 = as.yearmon(c("2020-03", "2020-07")),
  Date2_1 = as.yearmon(c("2020-10", "2020-08")),
  hospitalname = c("Hospital A", "Hospital B")
)

我真的很感谢你的支持。

2wnc66cl

2wnc66cl1#

library(dplyr)
library(zoo)

result <- df %>%
      inner_join(df, by = "hospitalname") %>%
      filter(group.x == 0, group.y == 1,
             Date1.x <= Date2.y, Date1.y <= Date2.x) %>%
      select(studyid.0 = studyid.x, Date1_0 = Date1.x, Date2_0 = Date2.x,
             studyid.1 = studyid.y, Date1_1 = Date1.y, Date2_1 = Date2.y,
             hospitalname) %>%
      distinct()

result
  studyid.0  Date1_0  Date2_0 studyid.1  Date1_1  Date2_1 hospitalname
1         3 Oct 2020 Feb 2021         2 Mar 2020 Oct 2020   Hospital A
2         5 Jun 2020 Oct 2020         4 Jul 2020 Aug 2020   Hospital B

**inner_join()函数创建医院名称相同的行组合。接下来,使用filter()函数,您可以选择日期期间在0和1组之间重叠的行。然后,使用select()**函数重命名列以匹配您所需的输出。最后,**distinct()**函数删除由join函数创建的重复行。

或者,如@onyambu所述,您可以在inner_join()函数中添加'suffix = c(' _0 ',' _1 ')',而不是使用select():

result <- df %>%
  inner_join(df, by = "hospitalname", suffix = c("_0", "_1")) %>%
  filter(group_0 == 0, group_1 == 1,
         Date1_0 <= Date2_1, Date1_1 <= Date2_0) %>%
  distinct()

此“后缀”意味着所有同名列的第一个数据框的后缀为“_0”,第二个数据框的后缀为“_1”。

xdyibdwo

xdyibdwo2#

df %>%
     mutate(id=cumsum(group)) %>%
     pivot_wider(id_cols = c(id, hospitalname), names_from = group, 
                  values_from = studyid:Date2, names_vary = 'slowest') %>%
     drop_na()

# A tibble: 2 × 8
     id hospitalname studyid_0 Date1_0   Date2_0   studyid_1 Date1_1   Date2_1  
  <dbl> <chr>            <int> <yearmon> <yearmon>     <int> <yearmon> <yearmon>
1     1 Hospital A           3 Oct 2020  Feb 2021          2 Mar 2020  Oct 2020 
2     2 Hospital B           5 Jun 2020  Oct 2020          4 Jul 2020  Aug 2020

相关问题