我有这个DF1
library(dplyr)
library(tidyverse)
df1 = data.frame(ID = c(100,101,101,102,102,103,103,104,104,105,106),
x_line = c(1,1,2,1,2,1,2,1,2,1,1),
start_date = c('04/01/2018','05/01/2019','25/08/2021','08/03/2017','07/08/2018',
'09/04/2016','29/12/2018','04/08/2018','03/05/2022','04/01/2018','04/01/2018'),
end_date = c('04/05/2019','07/02/2020','27/09/2021','18/07/2018','17/10/2019',
'19/12/2018','22/12/2019','14/09/2021','26/12/2022','15/02/2020','24/08/2020')
)
字符集
以及以下DF2
df2 = data.frame(ID = c(100,100,100,101,101,102,102,103,103,104,104,105,105,106,106,106),
product_name = c('AA','BB','CC','AA','CC','DD','EE','DD','FF',
'AA','FF','DD','AA','CC','AA','BB'),
start_taken_date = c('04/05/2018','25/08/2018','27/09/2018','18/07/2019','25/11/2019',
'29/01/2018','07/09/2018','14/09/2017','01/01/2019','15/02/2019','24/08/2020',
'04/03/2019','04/08/2018',
'05/05/2018','06/06/2019','08/09/2018'),
end_taken_date = c('01/05/2019','26/09/2018','25/03/2019','25/09/2019','02/01/2020',
'19/06/2018','22/09/2019','16/01/2018','04/03/2019','25/06/2022','23/07/2022',
'05/04/2019','05/09/2018',
'29/03/2019','07/07/2019','04/05/2020'))
型
df3是合并df1和df2的结果
df3 = df2%>%left_join(df1,.by=ID)
型
现在我希望创建df4,并满足以下条件(问题是它没有给予我想要的输出)
df4 = df3%>%mutate(line_m = ifelse(start_taken_date >=start_date & end_taken_date <= end_date,
x_line,NA))
型
所需的最终输出如下
ID product_name start_taken_date end_taken_date x_line
1 100 AA 04/05/2018 01/05/2019 1
2 100 BB 25/08/2018 26/09/2018 1
3 100 CC 27/09/2018 25/03/2019 1
4 101 AA 18/07/2019 25/09/2019 1
5 101 CC 25/11/2019 02/01/2020 1
6 102 DD 29/01/2018 19/06/2018 1
7 102 EE 07/09/2018 22/09/2019 2
8 103 DD 14/09/2017 16/01/2018 1
9 103 FF 01/01/2019 04/03/2019 2
10 104 AA 15/02/2019 25/06/2022 1
11 104 FF 24/08/2020 23/07/2022 1
12 105 DD 04/03/2019 05/04/2019 1
13 105 AA 04/08/2018 05/09/2018 1
14 106 CC 05/05/2018 29/03/2019 1
15 106 AA 06/06/2019 07/07/2019 1
16 106 BB 08/09/2018 04/05/2020 1
型
4条答案
按热度按时间rt4zxlrg1#
dplyr
解决方案是使用join_by
执行间隔连接。您需要安装dplyr 1.1.0
才能使用此功能。字符集
cmssoen22#
主要问题是
left_join
函数没有考虑日期范围或x_line
变量。当一个ID在df1
中有多行时,这会导致问题,因为它匹配所有可能的组合,而不考虑日期范围。请将
date
列转换为Date
类型,因为目前它们是字符。这可能导致不正确的比较。你可以使用lubridate
来进行日期转换。字符集
match_fun
参数是一个函数列表,这些函数应返回TRUE
,以获取by参数中相应位置的匹配项。在本例中,我们检查ID是否相等,start_taken_date
在start_date之后,end_taken_date
在end_date
之前。lf5gs5x23#
data.table
字符集
hrirmatl4#
**
data.table
**使用foverlaps
的方法字符集