python-3.x 如何使用Pandas数据框计算月至今(MTD)和YTD?

mkh04yzy  于 2022-12-20  发布在  Python
关注(0)|答案(2)|浏览(142)

我想使用pandas Dataframe 计算MTDYTD。为此,我编写了一个代码,并得到了以下错误。
代码:

import pandas as pd

data = {'date' : ['2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30','2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30', '2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30','2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30', '2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30','2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30'],
        'product': ['Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange'],
        'price': [10, 20, 10, 50, 10, 5, 10, 10, 20, 10, 5, 5, 10, 10, 20, 50, 10, 5, 20, 10, 10, 20, 50, 20, 5, 5, 10, 10, 20, 50, 30, 10, 20, 5, 5, 10, 20, 10, 20, 10, 40, 20, 10, 10, 20, 20, 10, 5]}

df = pd.DataFrame(data)

print("Dataframe-----------------------------------")
print(df)
print("Dataframe Ends------------------------------")

df.date = pd.to_datetime(df.date)
df = df.groupby('date', 'product').price.sum()
df = df.groupby(df.index.to_period('m')).cumsum().reset_index()

print("MTD Dataframe")
print(df)

错误:

Traceback (most recent call last):
  File "/home/ab/PycharmProjects/parry-analytics/lib/python3.9/site-packages/pandas/core/generic.py", line 550, in _get_axis_number
return cls._AXIS_TO_AXIS_NUMBER[axis]
 KeyError: 'product'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ab/parry-data_processing/parry-analytics/poc.py", line 15, in <module>
df = df.groupby('date', 'product').price.sum()
  File "/home/ab/PycharmProjects/parry-analytics/lib/python3.9/site-packages/pandas/core/frame.py", line 7713, in groupby
axis = self._get_axis_number(axis)
  File "/home/ab/PycharmProjects/parry-analytics/lib/python3.9/site-packages/pandas/core/generic.py", line 552, in _get_axis_number
raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
ValueError: No axis named product for object type DataFrame

有谁能提出解决这个问题的方案吗?
预期MTD输出:

date product  price
0   2017/01/01   Apple     10
1   2017/01/02   Apple     30
2   2017/01/03   Apple     40
3   2017/01/04   Apple     90
4   2017/01/15   Apple     100
5   2017/01/20   Apple     105
6   2017/01/23   Apple     115
7   2017/01/30   Apple     125
8   2017/01/01  Orange     20
9   2017/01/02  Orange     30
10  2017/01/03  Orange     35
11  2017/01/04  Orange     40
12  2017/01/15  Orange     50
13  2017/01/20  Orange     60
14  2017/01/23  Orange     80
15  2017/01/30  Orange     130
16  2017/04/01   Apple     10
17  2017/04/02   Apple     15
18  2017/04/03   Apple     35
19  2017/04/04   Apple     45
20  2017/04/15   Apple     55
21  2017/04/20   Apple     75
22  2017/04/23   Apple     125
23  2017/04/30   Apple     145
24  2017/04/01  Orange      5
25  2017/04/02  Orange     10
26  2017/04/03  Orange     20
27  2017/04/04  Orange     30
28  2017/04/15  Orange     50
29  2017/04/20  Orange     100
30  2017/04/23  Orange     130
31  2017/04/30  Orange     140
32  2017/05/01   Apple     20
33  2017/05/02   Apple     25
34  2017/05/03   Apple     30
35  2017/05/04   Apple     40
36  2017/05/15   Apple     60
37  2017/05/20   Apple     70
38  2017/05/23   Apple     90
39  2017/05/30   Apple     100
40  2017/05/01  Orange     40
41  2017/05/02  Orange     60
42  2017/05/03  Orange     70
43  2017/05/04  Orange     80
44  2017/05/15  Orange     100
45  2017/05/20  Orange     120
46  2017/05/23  Orange     130
47  2017/05/30  Orange     135

预计本年迄今产量:
同上。但应从财政年度开始(4月份)和产品开始计算。

xzlaal3s

xzlaal3s1#

用途:

df.date = pd.to_datetime(df.date)
df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product']).price.cumsum()

#df['test'] = df.date.dt.to_period('A-MAR')
df['YTD'] = df.groupby([df.date.dt.to_period('A-MAR'),'product']).price.cumsum()
print(df.tail(20))
         date product  price  MTD  YTD
28 2017-04-15  Orange     20   50   50
29 2017-04-20  Orange     50  100  100
30 2017-04-23  Orange     30  130  130
31 2017-04-30  Orange     10  140  140
32 2017-05-01   Apple     20   20  165
33 2017-05-02   Apple      5   25  170
34 2017-05-03   Apple      5   30  175
35 2017-05-04   Apple     10   40  185
36 2017-05-15   Apple     20   60  205
37 2017-05-20   Apple     10   70  215
38 2017-05-23   Apple     20   90  235
39 2017-05-30   Apple     10  100  245
40 2017-05-01  Orange     40   40  180
41 2017-05-02  Orange     20   60  200
42 2017-05-03  Orange     10   70  210
43 2017-05-04  Orange     10   80  220
44 2017-05-15  Orange     20  100  240
45 2017-05-20  Orange     20  120  260
46 2017-05-23  Orange     10  130  270
47 2017-05-30  Orange      5  135  275
7bsow1i6

7bsow1i62#

您看到的错误可能是由于groupby函数的语法问题造成的。
在行df = df.groupby('date', 'product').price.sum()中,groupby函数有两个参数:'date'和'product'。但是,这些参数应作为字符串列表传递,如下所示:

df = df.groupby(['date', 'product']).price.sum().

同时更改行

df = df.groupby(df.index.to_period('m')).cumsum().reset_index() to df['price'] = df['price'].groupby(df.index.to_period('m')).cumsum().reset_index().

相关问题