pandas 跨行比较值

brgchamk  于 2023-09-29  发布在  其他
关注(0)|答案(2)|浏览(120)

我正在尝试跨行比较同一列中的值。例如,在下面的示例数据框中,我想为价格变化创建一个列,如果任何一年的价格比前一年下降/上升,则将其填充为“减少”/“增加”。

df = pd.DataFrame({
    'Fruit': ['Apple', 'Orange', 'Grapes', 'Bananas', 'Apple', 'Orange',
              'Grapes', 'Bananas'],
    'year': [2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016],
    'price': [5, 7, 7, 9, 12, 9, 9, 4],
    'stock': [11, 8, 10, 6, 6, 5, 9, 12]})
df['change'] = ''

df['change'] = df.apply(
    lambda x:
        'Reduced' if x['price'] < df.loc[
            (df['Fruit'] == x['Fruit']) & (df['year'] == (x['year']-1))
        ]['price'].any() else x['change'],
    axis=1)

我得到一个错误“列必须与键相同的长度”。此外,即使这是解决了我怀疑这是否会工作,因为行索引。

t2a7ltrp

t2a7ltrp1#

可以使用groupby.diffnumpy.where

import numpy as np

d = df.groupby('Fruit')['price'].diff()
# if the years are not sorted
# d = df.sort_values(by='year').groupby('Fruit')['price'].diff()

m = d.notna()

df.loc[m, 'change'] = np.where(d[m].gt(0), 'Increased', 'Reduced')

或者使用numpy.signmap

df['change'] = (np.sign(df.groupby('Fruit')['price'].diff())
                  .map({1: 'Increased', -1: 'Reduced', 0: 'Unchanged'})
                  .fillna('')
               )

输出量:

Fruit  year  price  stock     change
0    Apple  2015      5     11           
1   Orange  2015      7      8           
2   Grapes  2015      7     10           
3  Bananas  2015      9      6           
4    Apple  2016     12      6  Increased
5   Orange  2016      9      5  Increased
6   Grapes  2016      9      9  Increased
7  Bananas  2016      4     12    Reduced

如果您想与所有前几年的最大值进行比较(而不仅仅是前一年):

tmp = df.pivot(index='Fruit', columns='year', values='price')

out = df.merge(
 np.sign(tmp.sub(tmp.cummax().shift(axis=1)))
   .replace({1: 'Increased', -1: 'Reduced', 0: 'Unchanged'})
   .stack().reset_index(name='change'), how='left'
)
wixjitnu

wixjitnu2#

谢谢@mozway,@wjandrea。我可以让它与df工作。申请!以下是看起来有效的方法。请指出使用它时可能存在的运行时风险。我明白,如果我们将价格与n年的最小/最大值进行比较,这将变得更加复杂。如果有更优雅的解决方案,我会欢迎的。

df['change'] = df.apply(
    lambda x:
        x['change'] if 
    len(df.loc[(df['Fruit'] == x['Fruit']) & (df['year'] == (x['year']-1))
        ]) == 0  else 'Decreased' if 
      (x['price'] < df.loc[(df['Fruit'] == x['Fruit']) & 
      (df['year'] == (x['year']-1))]['price'].values[0]) else 
      'Increased' if  (x['price'] > df.loc[
            (df['Fruit'] == x['Fruit']) & (df['year'] == (x['year']-1))
        ]['price'].values[0]) else 'No Change', axis=1)

相关问题