滚动12个月内至少有一笔交易的客户总数- pandas

xxb16uws  于 2023-05-15  发布在  其他
关注(0)|答案(2)|浏览(100)

我有一个事务 Dataframe ,看起来像这样:

| customer_id | purchase_date | purchase_amt | ...
|-------------|---------------|--------------| ...
| 1           | 12-01-2023    | 150.00       | ...
| 2           | 11-24-2022    | 84.23        |..

我需要计算在12个月期间至少购买了一次的客户的12个月滚动总和。我被卡住了。我的想法是尝试这样的东西:

df2 = df.groupby('customer_id').apply(foo).to_frame().rename(columns:{0:'active customer'})
df = df.join(df2,on='customer_id')
df.groupby('active_customer').rolling(12).sum()

其中,函数foo将标记客户在过去12个月内是否有交易。但我无法实现这样的功能。
有人能帮忙吗?

bvn4nwqk

bvn4nwqk1#

使用365D作为滚动要素的值如何

customer_id = np.random.randint(1, 3, size=(10))

purchase_date = pd.date_range(start='2022-01-01', end='2022-12-10').to_series().sample(10)

df = pd.DataFrame({'customer_id': customer_id,
                   'purchase_date': purchase_date,
                   'purchase_amt': np.random.randint(1000, 5000, size=(10))})

df['purchase_date'] = pd.to_datetime(df['purchase_date'])

df = df.set_index('purchase_date')

df = df.sort_index()

rolling_sum = df.groupby('customer_id')['purchase_amt'].rolling('365D').sum()

print(rolling_sum)
6l7fqoea

6l7fqoea2#

您可以将您的购买日期转换为每月期间,然后与客户交叉。最后将宽(dfw)输出转换为长(dfl):

# Compute the dense matrix (you can also use pivot_table)
dfw = pd.crosstab(df['customer_id'], df['purchase_date'].dt.to_period('M')).clip(upper=1)

# Fill missing periods? (no purchase in one month so maybe you don't need this step)
start, end = dfw.columns.min(), dfw.columns.max()
dfw = dfw.reindex(pd.period_range(start, end, freq='M', name='purchase_date'), axis=1, fill_value=0)

# Transform wide to long and remove 0s
dfw = dfw.rolling(12, min_periods=1, axis=1).sum().astype(int)
dfl = dfw.stack().loc[lambda x: x > 0]

宽输出:

>>> dfw
purchase_date  2022-01  2022-02  2022-03  2022-04  2022-05  2022-06  2022-07  2022-08  ...  2023-05  2023-06  2023-07  2023-08  2023-09  2023-10  2023-11  2023-12
customer_id                                                                            ...                                                                        
1                    1        1        2        2        2        3        3        4  ...        5        4        4        3        3        3        2        1
2                    1        2        3        3        3        3        4        4  ...        3        3        2        2        2        2        2        2
3                    0        1        1        1        2        2        2        2  ...        3        3        3        3        2        3        2        2
4                    0        0        0        1        1        1        1        1  ...        1        2        2        3        3        3        4        4
5                    0        0        1        1        1        1        1        1  ...        2        2        2        2        2        2        3        3
...                ...      ...      ...      ...      ...      ...      ...      ...  ...      ...      ...      ...      ...      ...      ...      ...      ...
95                   0        1        1        2        3        4        4        5  ...        4        3        3        2        2        1        0        0
96                   0        1        1        1        1        1        1        1  ...        5        5        5        6        6        6        6        5
97                   0        0        0        0        0        0        0        0  ...        3        3        3        3        3        3        2        1
98                   0        0        0        0        1        1        1        1  ...        2        2        2        2        2        3        4        4
99                   0        0        0        0        0        0        0        0  ...        2        2        2        3        4        4        4        4

[98 rows x 24 columns]

长输出(删除0):

>>> dfl
customer_id  purchase_date
1            2022-01          1
             2022-02          1
             2022-03          2
             2022-04          2
             2022-05          2
                             ..
99           2023-08          3
             2023-09          4
             2023-10          4
             2023-11          4
             2023-12          4
Length: 2069, dtype: int64

相关问题