pandas 基于2个相邻列条件的累积和

e4yzc0pl  于 2024-01-04  发布在  其他
关注(0)|答案(2)|浏览(128)

抱歉,我是coding/pandas/python的新手。
我试图求和列只有当有一个“1”位于“买”和“卖”相邻列
| 日期|买|卖|值|累积和|
| --|--|--|--|--|
| 2023年01月01日||| 1 ||
| 2023年02月01日星期一| 1 || 5 | 5 |
| 2019 - 03 - 01||| 1 | 6 |
| 2019 - 04 - 01||| 1 | 7 |
| 2019 - 05 - 23|| 1 | 1 | 8 |
| 2019 - 06 - 23||| 5 ||
我看到有一个名为cumsum(axis=None,skipna=True,*args,kwargs)的公式,可能可以做到这一点。但不幸的是,我不太确定如何根据buysell**列包含条件
我不知道如何用pandas公式来做这件事。我可能需要使用一个循环来做这件事吗?
任何建议或提示将不胜感激!!

e4eetjau

e4eetjau1#

尽量避免循环。我想这就是你要找的:

  1. df1 = pd.DataFrame({
  2. 'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12'],
  3. 'buy': [None,1,None,None,None,None,None,1,None,None,None,None],
  4. 'sell':[None,None,None,None,1,None,None,None,None,None,1,None],
  5. 'value':[1,5,1,1,1,5,1,5,1,1,1,5]
  6. }
  7. )
  8. # make sell negative so it can be discerned from buy
  9. df1['sell'] = df1['sell'] * -1
  10. print(df1)
  11. date buy sell value
  12. 0 2023-01-01 NaN NaN 1
  13. 1 2023-01-02 1.0 NaN 5
  14. 2 2023-01-03 NaN NaN 1
  15. 3 2023-01-04 NaN NaN 1
  16. 4 2023-01-05 NaN -1.0 1
  17. 5 2023-01-06 NaN NaN 5
  18. 6 2023-01-07 NaN NaN 1
  19. 7 2023-01-08 1.0 NaN 5
  20. 8 2023-01-09 NaN NaN 1
  21. 9 2023-01-10 NaN NaN 1
  22. 10 2023-01-11 NaN -1.0 1
  23. 11 2023-01-12 NaN NaN 5
  24. # create 'buysell' column which is combination of 'buy' and 'sell' columns
  25. df1['buysell'] = df1['buy'].combine_first(df1['sell'])
  26. # use ffill() to fill buysell from 1 until it is not 1
  27. df1.loc[df1['buysell'].ffill() == 1, 'buysell'] = 1
  28. # create a mask for where 'buysell' is not NaN
  29. mask = ~df1['buysell'].isna()
  30. # use the mask to create a 'buysellvalue' column with the contents of 'value' column for rows where the mask is true
  31. df1.loc[mask, 'buysellvalue'] = df1.loc[mask, 'value']
  32. # use cumsum()
  33. df1['cumbuysellvalue'] = df1['buysellvalue'].cumsum()
  34. print(df1)
  35. date buy sell value buysell buysellval cumbuysellval
  36. 0 2023-01-01 NaN NaN 1 NaN NaN NaN
  37. 1 2023-01-02 1.0 NaN 5 1.0 5.0 5.0
  38. 2 2023-01-03 NaN NaN 1 1.0 1.0 6.0
  39. 3 2023-01-04 NaN NaN 1 1.0 1.0 7.0
  40. 4 2023-01-05 NaN -1.0 1 -1.0 1.0 8.0
  41. 5 2023-01-06 NaN NaN 5 NaN NaN NaN
  42. 6 2023-01-07 NaN NaN 1 NaN NaN NaN
  43. 7 2023-01-08 1.0 NaN 5 1.0 5.0 13.0
  44. 8 2023-01-09 NaN NaN 1 1.0 1.0 14.0
  45. 9 2023-01-10 NaN NaN 1 1.0 1.0 15.0
  46. 10 2023-01-11 NaN -1.0 1 -1.0 1.0 16.0
  47. 11 2023-01-12 NaN NaN 5 NaN NaN NaN

字符串

展开查看全部
dfddblmv

dfddblmv2#

看起来你打算在一对买入和卖出之间做一个累计。我们可以在一对买入/卖出之间创建不同的组,然后为这些组做一个累计。要做groupby,我们可以创建一个临时的buysellgroup列,它将帮助我们在多对买入/卖出之间做groupby。
假设我们有下面的表格:
x1c 0d1x的数据
然后,我们可以在value = 1.0的每对买入/卖出单元格之间创建一个buy_sell_group

  1. buy_sell_group = [np.nan for _ in range(len(df))]
  2. i = 1
  3. for b,s in zip(df.loc[df.buy == 1.0].index, df.loc[df.sell == 1.0].index):
  4. buy_sell_group[b:s+1] = [i] * (s+1-b)
  5. i += 1
  6. df['buy_sell_group'] = buy_sell_group

字符串
这导致下面的矩阵



然后,我们可以对'buy_sell_group'列执行groupby,并对'value'列执行cumsum

  1. df['cumsum'] = df.groupby('buy_sell_group')['value'].cumsum()
  2. df = df.drop('buy_sell_group', axis=1)


最后一个框架

请注意,上述实现假设:

  • 每次买入=1,对应的卖出=1,
  • 每个买入/卖出对之间不会有任何其他买入/卖出= 1值
展开查看全部

相关问题