pandas panda、filter DataFrame和sum repeat Value

9gm1akwq  于 2023-10-14  发布在  其他
关注(0)|答案(1)|浏览(95)

欢迎大家。
我有dataFrame像这样,我想计算与下一行重复的行的值,并删除前一行。
对于从日期计算的值,我想把它们放在另一列中。
The recurring values X are summed without the price, In the duplicate values in A. Then delete the previous values X.
As for the Date, it increases by 30 minutes, there is no value smaller than the previous value, in my DataFrame
I think that if the Date was converted to timestamp, the subtraction would be easier. The result may be 30minut or 60minut or 90minut Or whatever the result is in minutes
For example, calculate time like this:

from datetime import datetime

//convert timestamps to datetime object
start_ts_ = datetime.fromtimestamp(1652426243.907874)
end_ts = datetime.fromtimestamp(1652436243.907874)

//Difference between two timestamps in seconds
delta = end_ts - start_ts_
print(f'{delta.total_seconds() / 60.0}' )

输出:166分钟

Date   price  client  Count    test
  0    2023-09-28 10:30    5       5       3        0
  1    2023-09-28 11:00    7       7       9        L
------------------------------------------------------
X 2    2023-09-28 11:30    3       1       10       GG
------------------------------------------------------
A 3    2023-09-28 12:00    4       3       15       GG
------------------------------------------------------
  4    2023-09-28 12:30    2       8       1        L
------------------------------------------------------
X 5    2023-09-28 13:00    7       4       8        GG
X 6    2023-09-28 13:30    5       6       9        GG
X 7    2023-09-28 14:00    7       4       8        GG
X 8    2023-09-28 14:30    5       6       9        GG
X 9    2023-09-28 15:00    7       4       8        GG
------------------------------------------------------
A 10   2023-09-28 13:30    5       6       9        GG
------------------------------------------------------
X 11   2023-09-28 16:00    1       3       12       L
X 12   2023-09-28 14:30    6       1       11       L
------------------------------------------------------
A 13   2023-09-28 16:00    1       3       12       L
  14   2023-09-28 16:30    5       6       9        GG

结果是这样的:

Date   price  client  Count    test   time
0   2023-09-28 10:30    5       5       3        0      30minut
1   2023-09-28 11:00    7       7       9        L      30minut
3   2023-09-28 12:00    4       4       25       GG     1H
4   2023-09-28 13:30    2       8       1        L      30minut
6   2023-09-28 14:30    5       10      17       GG     1H
8   2023-09-28 15:30    6       4       23       L      1H

列中计算的值为:客户端计数(& C)。
我知道这很复杂。请帮帮忙谢谢

omhiaaxx

omhiaaxx1#

假设你想在计算diff后对连续的“测试”进行分组:

# ensure datetime
df['Date'] = pd.to_datetime(df['Date'])

# group successive values
group = df['test'].ne(df['test'].shift()).cumsum()

# define aggregations
agg = {'Date': 'first', 'price': 'sum', 'client': 'sum', 'Count': 'sum', 'test': 'first', 'time': 'sum'}

# compute diff, aggregate consecutive values
out = df.assign(time=df['Date'].diff()).groupby(group).agg(agg)

输出量:

Date  price  client  Count test            time
test                                                               
1    2023-09-28 10:30:00      5       5      3    0 0 days 00:00:00
2    2023-09-28 11:00:00      7       7      9    L 0 days 00:30:00
3    2023-09-28 11:30:00      7       4     25   GG 0 days 01:00:00
4    2023-09-28 13:30:00      2       8      1    L 0 days 01:30:00
5    2023-09-28 12:00:00     12      10     17   GG 0 days 01:00:00
6    2023-09-28 15:00:00      7       4     23    L 0 days 01:00:00

相关问题