Pandas根据另一个 Dataframe 上的日期范围设置值,并为不匹配的行合并和插入空值

zsohkypk  于 2023-03-11  发布在  其他
关注(0)|答案(1)|浏览(124)

这是一个后续的问题,这篇文章:Pandas setting a value depending on date ranges on another dataframe
如果交易 Dataframe 中的行在费率 Dataframe 中没有匹配的agentname,我们如何仍然保留这些行,但将agentname_rates列的值设置为空/na?
费率表

Agentname   ProductType     OldRate NewRate StartDate   EndDate
   0   VSFAAL      SPORTS       0.0    10.0    2020-11-05  2021-01-18
   1   VSFAAL      APPAREL      0.0    35.0    2020-11-05  2022-05-03
   2   VSFAAL      SPORTS      10.0    15.0    2021-01-18  2022-05-03
   3   VSFAALJS    SPORTS       0.0    10.0    2020-11-07  2022-05-03
   4   VSFAALJS    APPAREL      0.0    15.0    2020-11-07  2021-11-09
   5   VSFAALJS    APPAREL     15.0     5.0    2021-11-09  2022-05-03

交易记录表

Date                         Sales   Agentname   ProductType     
   0 2020-12-01 08:00:02        100.0  VSFAAL      SPORTS       
   1 2022-03-01 08:00:09         99.0  VSFAAL      APPAREL      
   2 2022-03-01 08:00:14         75.0  VSFAAL      SPORTS       
   3 2021-05-01 08:00:39         67.0  VSFAALJS    SPORTS 
   4 2020-05-01 08:00:56         160.0 VSFAALJS    APPAREL           
   5 2021-05-01 08:00:56         65.0  VSFAALJS    APPAREL 
   6 2021-06-03 09:07:33         55.0  VSRANDOM    SPORTS

预期输出

Date              Sales   Agentname   ProductType     Agentname_rates
   0 2020-12-01 08:00:02        100.0  VSFAAL      SPORTS             10.0
   1 2022-03-01 08:00:09         99.0  VSFAAL      APPAREL            35.0
   2 2022-03-01 08:00:14         75.0  VSFAAL      SPORTS             15.0
   3 2021-05-01 08:00:39         67.0  VSFAALJS    SPORTS             10.0
   4 2020-05-01 08:00:56         160.0 VSFAALJS    APPAREL              NULL
   5 2021-05-01 08:00:56         65.0  VSFAALJS    APPAREL            15.0
   6 2021-06-03 09:07:33         55.0  VSRANDOM    SPORTS             NULL

下面的代码合并两个表,但不保留我想保留的两行空值。

df3=df2.merge(df[['StartDate', 'EndDate','NewRate']], 
         left_on =[df2['Agentname'], df2['ProductType']],
         right_on=[df['Agentname'],  df['ProductType']],
              how='left',
          suffixes=('','_start')
        ).drop(columns=['key_0', 'key_1' ])

df3[df3['Date'].astype('datetime64').dt.strftime('%Y-%m-%d').between(
                                      df3['StartDate'].astype('datetime64'),
                                      df3['EndDate'].astype('datetime64'))
   ]

谢谢!

u3r8eeie

u3r8eeie1#

您可以使用左连接来删除StartDate/EndDate列:

df3 = df2.merge(df3.drop(['StartDate','EndDate'], axis=1), how='left')
print (df3)
                  Date  Sales Agentname ProductType  NewRate
0  2020-12-01 08:00:02  100.0    VSFAAL      SPORTS     10.0
1  2022-03-01 08:00:09   99.0    VSFAAL     APPAREL     35.0
2  2022-03-01 08:00:14   75.0    VSFAAL      SPORTS     15.0
3  2021-05-01 08:00:39   67.0  VSFAALJS      SPORTS     10.0
4  2020-05-01 08:00:56  160.0  VSFAALJS     APPAREL      NaN
5  2021-05-01 08:00:56   65.0  VSFAALJS     APPAREL     15.0
6  2021-06-03 09:07:33   55.0  VSRANDOM      SPORTS      NaN

相关问题