我有一个数据集,其中包含变量Person,RelevantCase,StartDate和EndDate:
df <- data.frame(Person = c('111','334','334','334','334','334','888','888','888','888','888','888','888','888'),
RelevantCase = c(0,1,1,0,1,0,1,0,1,0,0,1,0,1),
StartDate = c('2017-03-04','2015-11-14','2018-04-26','2020-01-24','2020-01-25','2020-02-29','2015-08-09',
'2015-08-09','2018-04-10','2019-09-20','2020-06-30','2020-11-01','2021-08-13','2022-11-11'),
EndDate = c('2017-12-12','2022-01-25','2020-03-01','2021-02-24','2020-01-30','2022-02-02','2019-10-20',
'2019-10-30','2018-10-10','2021-10-10','2020-07-20','2022-11-20','2021-11-12','2023-01-01')
)
我想创建两个新变量:
1.每个人的相关未决案例数。即,我想统计有多少个相关案例
1.1. StartDates * 在 * 当前案例的StartDate和之前
1.2. EndDates * 在当前StartDate * 上或之后。
所谓“relevant case”,我的意思是我只想对RelevantCase==1的观察进行计数。
1.在当前StartDate的最后两年内开始的每个人员的相关未结案例数的计数。因此,这与第一个新变量相同,但它不会计算StartDate早于当前StartDate两年以上的相关未结案例。
生成的数据集应如下所示:
df2 <- data.frame(Person = c('111','334','334','334','334','334','888','888','888','888','888','888','888','888'),
RelevantCase = c(0,1,1,0,1,0,1,0,1,0,0,1,0,1),
StartDate = c('2017-03-04','2015-11-14','2018-04-26','2020-01-24','2020-01-25','2020-02-29','2015-08-09',
'2015-08-09','2018-04-10','2019-09-20','2020-06-30','2020-11-01','2021-08-13','2022-11-11'),
EndDate = c('2017-12-12','2022-01-25','2020-03-01','2021-02-24','2020-01-30','2022-02-02','2019-10-20',
'2019-10-30','2018-10-10','2021-10-10','2020-07-20','2022-11-20','2021-11-12','2023-01-01'),
NumberOpenCases = c(0,0,1,2,2,2,0,0,1,1,0,0,1,1),
NumberOpenCases_2y = c(0,0,0,1,1,1,0,0,0,0,0,0,1,0)
)
1条答案
按热度按时间xdnvmnnf1#
这通过在每个组内的
StartDate
列上循环并检查所需的条件来给出相关未决案例的数量。