pandas 根据条件日期范围从另一个框架填充框架

9gm1akwq  于 11个月前  发布在  其他
关注(0)|答案(2)|浏览(104)

是否可以根据指定的条件从另一个数据框数据填充数据框?
df1 =(条件日期范围)

company   start date  end date
0    a      2023-01-02  2023-01-06
1    b      2023-01-05  2023-01-12
2    c      2023-01-04  2023-01-13
3    d      2023-01-03  2023-01-10

字符串
df2 =数据

DATE        a   b   c   d
0   2023-01-02  1   10  30  40
1   2023-01-03  2   11  31  41
2   2023-01-04  3   12  32  42
3   2023-01-05  4   13  33  43
4   2023-01-06  5   14  34  44
5   2023-01-09  6   15  35  45
6   2023-01-10  7   16  36  46
7   2023-01-11  8   17  37  47
8   2023-01-12  9   18  38  48
9   2023-01-13  10  19  39  49


df3 =期望输出

DATE         a  b     c     d
0   2023-01-02  1.0 NaN   NaN   NaN
1   2023-01-03  2.0 NaN   NaN   41.0
2   2023-01-04  3.0 NaN   32.0  42.0
3   2023-01-05  4.0 13.0  33.0  43.0
4   2023-01-06  5.0 14.0  34.0  44.0
5   2023-01-09  NaN 15.0  35.0  45.0
6   2023-01-10  NaN 16.0  36.0  46.0
7   2023-01-11  NaN 17.0  37.0  NaN
8   2023-01-12  NaN 18.0  38.0  NaN
9   2023-01-13  NaN NaN   39.0  NaN

vmpqdwk3

vmpqdwk31#

  • 酒店Stpe 1**

首先转换为日期时间

df1['start date'] = pd.to_datetime(df1['start date'])
df1['end date'] = pd.to_datetime(df1['end date'])
df2['DATE'] = pd.to_datetime(df2['DATE'])

字符串

步骤2

melt & use merge_asof & chk range & pivot

tmp = pd.merge_asof(df2.melt('DATE', var_name='company').sort_values('DATE'), 
                    df1.sort_values('start date'), 
                    by='company', left_on='DATE', right_on='start date')
df3 = tmp.assign(value=tmp['value'].where(tmp['DATE'].le(tmp['end date'])))\
         .pivot(index='DATE', columns='company', values='value')\
         .rename_axis('', axis=1).reset_index()


df3:

DATE         a  b     c     d
0   2023-01-02  1.0 NaN   NaN   NaN
1   2023-01-03  2.0 NaN   NaN   41.0
2   2023-01-04  3.0 NaN   32.0  42.0
3   2023-01-05  4.0 13.0  33.0  43.0
4   2023-01-06  5.0 14.0  34.0  44.0
5   2023-01-09  NaN 15.0  35.0  45.0
6   2023-01-10  NaN 16.0  36.0  46.0
7   2023-01-11  NaN 17.0  37.0  NaN
8   2023-01-12  NaN 18.0  38.0  NaN
9   2023-01-13  NaN NaN   39.0  NaN

示例代码

import pandas as pd
data1 = {'company': {0: 'a', 1: 'b', 2: 'c', 3: 'd'}, 
         'start date': {0: '2023-01-02', 1: '2023-01-05', 2: '2023-01-04', 3: '2023-01-03'}, 
         'end date': {0: '2023-01-06', 1: '2023-01-12', 2: '2023-01-13', 3: '2023-01-10'}}
data2 = {'DATE': {0: '2023-01-02', 1: '2023-01-03', 2: '2023-01-04', 3: '2023-01-05', 
                  4: '2023-01-06', 5: '2023-01-09', 6: '2023-01-10', 7: '2023-01-11', 
                  8: '2023-01-12', 9: '2023-01-13'}, 
         'a': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10}, 
         'b': {0: 10, 1: 11, 2: 12, 3: 13, 4: 14, 5: 15, 6: 16, 7: 17, 8: 18, 9: 19}, 
         'c': {0: 30, 1: 31, 2: 32, 3: 33, 4: 34, 5: 35, 6: 36, 7: 37, 8: 38, 9: 39}, 
         'd': {0: 40, 1: 41, 2: 42, 3: 43, 4: 44, 5: 45, 6: 46, 7: 47, 8: 48, 9: 49}}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

guicsvcw

guicsvcw2#

感谢@pandaskim提供了一个可行的数据。一个选项是获取df 1匹配df 2的位置;我们已经知道company列与df 2(a,b,c,d)中的列顺序相同,我们利用这一点来创建新的框架,并在pyjanitor的conditional_join和numpy的帮助下:

# pip install pyjanitor
import pandas as pd
import numpy as np
import janitor

df1['start date'] = pd.to_datetime(df1['start date'])
df1['end date'] = pd.to_datetime(df1['end date'])
df2['DATE'] = pd.to_datetime(df2['DATE'])
# get positions where df1 matches df2, based on the dates
# essentially what are the positions where start date <= DATE <= end date
outcome=(df2
        .assign(pos=df2.index)
        .conditional_join(
            df1.assign(posn=df1.index), 
            ('DATE','start date','>='),
            ('DATE','end date','<='), 
            df_columns='pos', 
            right_columns='posn')
       )
left = outcome.pos
right=outcome.posn
shape=df2.drop(columns='DATE').shape
repl=df2.drop(columns='DATE').to_numpy()
# fancy indexing with numpy 
arr=np.empty(shape,dtype=float)
arr[:]=np.nan
arr[left,right]=repl[left,right]
#build new dataframe
df2.assign(**pd.DataFrame(arr, columns=['a','b','c','d']))
        DATE    a     b     c     d
0 2023-01-02  1.0   NaN   NaN   NaN
1 2023-01-03  2.0   NaN   NaN  41.0
2 2023-01-04  3.0   NaN  32.0  42.0
3 2023-01-05  4.0  13.0  33.0  43.0
4 2023-01-06  5.0  14.0  34.0  44.0
5 2023-01-09  NaN  15.0  35.0  45.0
6 2023-01-10  NaN  16.0  36.0  46.0
7 2023-01-11  NaN  17.0  37.0   NaN
8 2023-01-12  NaN  18.0  38.0   NaN
9 2023-01-13  NaN   NaN  39.0   NaN

字符串

相关问题