R语言 计数有和无时间截止的未结病例

bbmckpt7  于 2023-04-18  发布在  其他
关注(0)|答案(1)|浏览(81)

我有一个数据集,其中包含变量Person,RelevantCase,StartDate和EndDate:

df <- data.frame(Person = c('111','334','334','334','334','334','888','888','888','888','888','888','888','888'), 
                 RelevantCase = c(0,1,1,0,1,0,1,0,1,0,0,1,0,1), 
                 StartDate = c('2017-03-04','2015-11-14','2018-04-26','2020-01-24','2020-01-25','2020-02-29','2015-08-09',
                          '2015-08-09','2018-04-10','2019-09-20','2020-06-30','2020-11-01','2021-08-13','2022-11-11'),
                 EndDate = c('2017-12-12','2022-01-25','2020-03-01','2021-02-24','2020-01-30','2022-02-02','2019-10-20',
                             '2019-10-30','2018-10-10','2021-10-10','2020-07-20','2022-11-20','2021-11-12','2023-01-01')
)

我想创建两个新变量:
1.每个人的相关未决案例数。即,我想统计有多少个相关案例
1.1. StartDates * 在 * 当前案例的StartDate之前
1.2. EndDates * 在当前StartDate * 上或之后。
所谓“relevant case”,我的意思是我只想对RelevantCase==1的观察进行计数。
1.在当前StartDate的最后两年内开始的每个人员的相关未结案例数的计数。因此,这与第一个新变量相同,但它不会计算StartDate早于当前StartDate两年以上的相关未结案例。
生成的数据集应如下所示:

df2 <- data.frame(Person = c('111','334','334','334','334','334','888','888','888','888','888','888','888','888'), 
                 RelevantCase = c(0,1,1,0,1,0,1,0,1,0,0,1,0,1), 
                 StartDate = c('2017-03-04','2015-11-14','2018-04-26','2020-01-24','2020-01-25','2020-02-29','2015-08-09',
                               '2015-08-09','2018-04-10','2019-09-20','2020-06-30','2020-11-01','2021-08-13','2022-11-11'),
                 EndDate = c('2017-12-12','2022-01-25','2020-03-01','2021-02-24','2020-01-30','2022-02-02','2019-10-20',
                             '2019-10-30','2018-10-10','2021-10-10','2020-07-20','2022-11-20','2021-11-12','2023-01-01'),
                 NumberOpenCases = c(0,0,1,2,2,2,0,0,1,1,0,0,1,1),
                 NumberOpenCases_2y = c(0,0,0,1,1,1,0,0,0,0,0,0,1,0)
)
xdnvmnnf

xdnvmnnf1#

这通过在每个组内的StartDate列上循环并检查所需的条件来给出相关未决案例的数量。

library(dplyr)
library(purrr)

df %>% 
  mutate(StartDate = as.Date(StartDate),
         EndDate = as.Date(EndDate)) %>% 
  arrange(Person, StartDate, EndDate) %>% 
  group_by(Person) %>% 
  mutate(NumberOpenCases    = map_int(StartDate, ~sum(StartDate < .x  & 
                                                      EndDate >= .x & 
                                                      RelevantCase == 1)),
         NumberOpenCases_2y = map_int(StartDate, ~sum(StartDate < .x  & 
                                                      EndDate >= .x & 
                                                      RelevantCase == 1 &
                                                      .x - StartDate < 730)))
#> # A tibble: 14 x 6
#> # Groups:   Person [3]
#>    Person RelevantCase StartDate  EndDate    NumberOpenCases NumberOpenCases_2y
#>    <chr>         <dbl> <date>     <date>               <int>              <int>
#>  1 111               0 2017-03-04 2017-12-12               0                  0
#>  2 334               1 2015-11-14 2022-01-25               0                  0
#>  3 334               1 2018-04-26 2020-03-01               1                  0
#>  4 334               0 2020-01-24 2021-02-24               2                  1
#>  5 334               1 2020-01-25 2020-01-30               2                  1
#>  6 334               0 2020-02-29 2022-02-02               2                  1
#>  7 888               1 2015-08-09 2019-10-20               0                  0
#>  8 888               0 2015-08-09 2019-10-30               0                  0
#>  9 888               1 2018-04-10 2018-10-10               1                  0
#> 10 888               0 2019-09-20 2021-10-10               1                  0
#> 11 888               0 2020-06-30 2020-07-20               0                  0
#> 12 888               1 2020-11-01 2022-11-20               0                  0
#> 13 888               0 2021-08-13 2021-11-12               1                  1
#> 14 888               1 2022-11-11 2023-01-01               1                  0

相关问题