pandas-createdataframe列&根据日期和类别填充现有列中的值

li9yvcax  于 2021-08-20  发布在  Java
关注(0)|答案(3)|浏览(245)

我有一个数据框,显示每家商店每件商品的销售额,如下所示:

date         item   storeNbr   Sales
2021-06-29   soap   123        100
2021-05-29   hat    129        500
2020-06-29   soap   123        0
2020-05-29   hat    129        10

我试图为去年的销售额创建一个列,该列应采用数据框中已经存在的值,其中日期等于上一年,并且商店编号和商品相同。所以它应该是这样的:

date         item   storeNbr   Sales   LY
2021-06-29   soap   123        100     0
2021-05-29   hat    129        500     10
2020-06-29   soap   123        0       Nan
2020-05-29   hat    129        10      Nan

我试过这个:

df['Previous'] = 
df.groupby([df['date'].dt.month,df['date'].dt.day,df['StoreNbr']]) 
['Sales'].shift()

但是我很难得到想要的结果。在此提前感谢您的帮助!

guz6ccqo

guz6ccqo1#

样本数据:

import pandas as pd
from pandas import Timestamp
df = pd.DataFrame({'date': {0: Timestamp('2021-06-29 00:00:00'), 1: Timestamp('2021-05-29 00:00:00'), 2: Timestamp('2020-06-29 00:00:00'), 3: Timestamp('2020-05-29 00:00:00')}, 'item': {0: 'soap', 1: 'hat', 2: 'soap', 3: 'hat'}, 'storeNbr': {0: 123, 1: 129, 2: 123, 3: 129}, 'Sales': {0: 100, 1: 500, 2: 0, 3: 10}})

代码:


# create copy of your data, but add 1 year from the date, then merge.

df2 = df.copy()
df2['date'] = df2['date'] + pd.DateOffset(years=1)
df['LY'] = df.drop('Sales', axis=1).merge(df2, on=['date', 'item', 'storeNbr'])['Sales']

输出:

date  item  storeNbr  Sales    LY
0 2021-06-29  soap       123    100   0.0
1 2021-05-29   hat       129    500  10.0
2 2020-06-29  soap       123      0   NaN
3 2020-05-29   hat       129     10   NaN

一个衬里由@scottboston提供

df.merge(df.assign(date = df['date'] + pd.DateOffset(years=1)), 
         on=['date','item','storeNbr'], 
         how='left', 
         suffixes=('','_y'))\
  .rename(columns={'Sales_y':'LY'})
vaj7vani

vaj7vani2#

如果您先对其排序,则可以执行groupby和shift。

df = df.sort_values(by=['item','date'])
df['LY'] = df.groupby('item')['Sales'].shift()

输出

date  item  storeNbr  Sales    LY
3 2020-05-29   hat       129     10   NaN
1 2021-05-29   hat       129    500  10.0
2 2020-06-29  soap       123      0   NaN
0 2021-06-29  soap       123    100   0.0
raogr8fs

raogr8fs3#

您的代码很接近,只有3个小改动:
按上的多个字段分组 item 添加参数 sort=False 在里面 groupby() 确保保留原始订单(最近一年优先)。
使用 shift(-1) 获取“下一个”值而不是 shift() 它获取上一个值。

df['LY'] = df.groupby([df['date'].dt.month ,df['date'].dt.day , df['storeNbr'], df['item']], sort=False)['Sales'].shift(-1)

结果:

print(df)

        date  item  storeNbr  Sales    LY
0 2021-06-29  soap       123    100   0.0
1 2021-05-29   hat       129    500  10.0
2 2020-06-29  soap       123      0   NaN
3 2020-05-29   hat       129     10   NaN

相关问题