pandas panda、filter DataFrame和sum repeat Value

9gm1akwq  于 2023-10-14  发布在  其他
关注(0)|答案(1)|浏览(141)

欢迎大家。
我有dataFrame像这样,我想计算与下一行重复的行的值,并删除前一行。
对于从日期计算的值,我想把它们放在另一列中。
The recurring values X are summed without the price, In the duplicate values in A. Then delete the previous values X.
As for the Date, it increases by 30 minutes, there is no value smaller than the previous value, in my DataFrame
I think that if the Date was converted to timestamp, the subtraction would be easier. The result may be 30minut or 60minut or 90minut Or whatever the result is in minutes
For example, calculate time like this:

  1. from datetime import datetime
  2. //convert timestamps to datetime object
  3. start_ts_ = datetime.fromtimestamp(1652426243.907874)
  4. end_ts = datetime.fromtimestamp(1652436243.907874)
  5. //Difference between two timestamps in seconds
  6. delta = end_ts - start_ts_
  7. print(f'{delta.total_seconds() / 60.0}' )

输出:166分钟

  1. Date price client Count test
  2. 0 2023-09-28 10:30 5 5 3 0
  3. 1 2023-09-28 11:00 7 7 9 L
  4. ------------------------------------------------------
  5. X 2 2023-09-28 11:30 3 1 10 GG
  6. ------------------------------------------------------
  7. A 3 2023-09-28 12:00 4 3 15 GG
  8. ------------------------------------------------------
  9. 4 2023-09-28 12:30 2 8 1 L
  10. ------------------------------------------------------
  11. X 5 2023-09-28 13:00 7 4 8 GG
  12. X 6 2023-09-28 13:30 5 6 9 GG
  13. X 7 2023-09-28 14:00 7 4 8 GG
  14. X 8 2023-09-28 14:30 5 6 9 GG
  15. X 9 2023-09-28 15:00 7 4 8 GG
  16. ------------------------------------------------------
  17. A 10 2023-09-28 13:30 5 6 9 GG
  18. ------------------------------------------------------
  19. X 11 2023-09-28 16:00 1 3 12 L
  20. X 12 2023-09-28 14:30 6 1 11 L
  21. ------------------------------------------------------
  22. A 13 2023-09-28 16:00 1 3 12 L
  23. 14 2023-09-28 16:30 5 6 9 GG

结果是这样的:

  1. Date price client Count test time
  2. 0 2023-09-28 10:30 5 5 3 0 30minut
  3. 1 2023-09-28 11:00 7 7 9 L 30minut
  4. 3 2023-09-28 12:00 4 4 25 GG 1H
  5. 4 2023-09-28 13:30 2 8 1 L 30minut
  6. 6 2023-09-28 14:30 5 10 17 GG 1H
  7. 8 2023-09-28 15:30 6 4 23 L 1H

列中计算的值为:客户端计数(& C)。
我知道这很复杂。请帮帮忙谢谢

omhiaaxx

omhiaaxx1#

假设你想在计算diff后对连续的“测试”进行分组:

  1. # ensure datetime
  2. df['Date'] = pd.to_datetime(df['Date'])
  3. # group successive values
  4. group = df['test'].ne(df['test'].shift()).cumsum()
  5. # define aggregations
  6. agg = {'Date': 'first', 'price': 'sum', 'client': 'sum', 'Count': 'sum', 'test': 'first', 'time': 'sum'}
  7. # compute diff, aggregate consecutive values
  8. out = df.assign(time=df['Date'].diff()).groupby(group).agg(agg)

输出量:

  1. Date price client Count test time
  2. test
  3. 1 2023-09-28 10:30:00 5 5 3 0 0 days 00:00:00
  4. 2 2023-09-28 11:00:00 7 7 9 L 0 days 00:30:00
  5. 3 2023-09-28 11:30:00 7 4 25 GG 0 days 01:00:00
  6. 4 2023-09-28 13:30:00 2 8 1 L 0 days 01:30:00
  7. 5 2023-09-28 12:00:00 12 10 17 GG 0 days 01:00:00
  8. 6 2023-09-28 15:00:00 7 4 23 L 0 days 01:00:00
展开查看全部

相关问题