pandas 如何重塑 Dataframe 以获得给定日期和时段的两个段的减法?

bd1hkmkf  于 2023-01-28  发布在  其他
关注(0)|答案(5)|浏览(196)

我有一个 Dataframe ,它看起来按日期-时段-组细分(即,对于每一天,我们有许多时段,在这些时段内,我们有两个组),看起来如下所示:

date       | bucket |  Group  |purchase
2020-01-01 | 1      |  A      | 12
2020-01-01 | 1      |  B      | 11

2020-01-01 | 2      |  A      | 14
2020-01-01 | 2      |  B      | 14

2020-02-01 | 1      |  A      | 11
2020-02-01 | 1      |  B      | 10

我想创建一个新的dataframe,其中的“差异”列如下所示:

date       | bucket |  purchase | difference
2020-01-01 | 1      |  12-11=1 (Group A - Group B for that day/bucket)
2020-01-01 | 2      |  0
2020-02-01 | 1      |  1

我怎样才能用这样的方式塑造我的df呢?

laawzig2

laawzig21#

这些方法不依赖于Group列的值,只依赖于有两个有序组。

out = (df.set_index(['date', 'bucket', 'Group'])['purchase']
         .groupby(['date', 'bucket'])
         .diff(-1)
         .dropna()
         .droplevel(-1)
         .reset_index(name='difference'))
print(out)

# OR 

out = (df.groupby(['date', 'bucket'])
         .apply(lambda x: x.groupby('Group')['purchase']
                           .sum()
                           .diff(-1)
                           .dropna()))
out.columns = ['difference']
out = out.reset_index()
print(out)

输出:

date  bucket  difference
0  2020-01-01        1         1.0
1  2020-01-01        2         0.0
2  2020-02-01        1         1.0

1-Mozway的班轮:

(df.set_index(['date', 'bucket', 'Group'])['purchase']
   .agg(lambda x: x.xs('A', level='Group').sub(x.xs('B', level='Group')))
   .reset_index(name='difference'))

输出:

date  bucket  difference
0  2020-01-01       1           1
1  2020-01-01       2           0
2  2020-02-01       1           1
pdkcd3nj

pdkcd3nj2#

Use <code>DataFrame.pivot_table</code> with <code>DataFrame.sub</code>:
df1 = df.pivot_table(index=['date','bucket'], columns='purchase', 
values='value')
   .sub(df1['B'], axis=0)
   .rename(columns={'A':'difference'})
   .reset_index()
print (df1)
  purchase       date  bucket  difference
  0 2020-01-01 2020-01-01       1           1
  1 2020-02-01 2020-02-01       1           1
2sbarzqh

2sbarzqh3#

def function1(dd:pd.DataFrame):
    return dd.assign(difference=dd.purchase.diff(-1)).iloc[0]

df1.groupby(["date",'bucket'],as_index=False).apply(function1).drop("purchase",axis=1)

出局

date    bucket  Group   difference
0   2020-01-01  1   A   1.0
1   2020-01-01  2   A   0.0
2   2020-02-01  1   A   1.0
qoefvg9y

qoefvg9y4#

当需要选择和对齐时,通常更容易将列设置为索引。这里使用xs来选择组:

s = df.set_index(['date', 'bucket', 'Group'])['purchase'].rename('difference')

(s.xs('A', level='Group')-s.xs('B', level='Group')).reset_index()

使用DataFrame作为中间层,然后重命名的变量(用于处理多列):

df2 = df.set_index(['date', 'bucket', 'Group'])

(df2.xs('A', level='Group')-df2.xs('B', level='Group')
 ).reset_index().rename(columns={'purchase': 'difference'})

输出:

date  bucket  difference
0  2020-01-01       1           1
1  2020-01-01       2           0
2  2020-02-01       1           1
xienkqul

xienkqul5#

您可以在计算diff之前重新调整 Dataframe :

out = (df.set_index(['date', 'bucket', 'group'])['purchase']
         .unstack('group').diff(-1, axis=1)['A']
         .rename('difference').reset_index())
print(out)

# Output
         date  bucket  difference
0  2020-01-01       1           1
1  2020-01-01       2           0
2  2020-02-01       1           1

或者使用pivot

out = (df.pivot(['date', 'bucket'], 'group', 'purchase')
         .diff(-1, axis=1)['A'].rename('difference').reset_index())

相关问题