parquet :Link to the sample dataset
CSV格式:Link to the sample dataset的
预期结果应具有30分钟样本窗口最后一个值和第一个值之间的差值结果(来自上述样本数据),即:
Item Value AgentStatus OriginalTimestamp AgentTimeStamp
0 Channel1.Device1.Tag1 847 good 2023-07-28T13:09:00.0098328+09:00 2023-07-28T13:09:00.0000000
1 Channel1.Device1.Tag2 0 good 2023-07-28T13:09:00.0408696+09:00 2023-07-28T13:09:00.0000000
2 Channel1.Device1.Tag1 848 good 2023-07-28T13:09:05.0138770+09:00 2023-07-28T13:09:05.0000000
3 Channel1.Device1.Tag2 0 good 2023-07-28T13:09:05.0454734+09:00 2023-07-28T13:09:05.0000000
4 Channel1.Device1.Tag1 849 good 2023-07-28T13:09:10.0073605+09:00 2023-07-28T13:09:10.0000000
5 Channel1.Device1.Tag2 0 good 2023-07-28T13:09:10.0379516+09:00 2023-07-28T13:09:10.0000000
6 Channel1.Device1.Tag1 850 good 2023-07-28T13:09:15.0074263+09:00 2023-07-28T13:09:15.0000000
7 Channel1.Device1.Tag2 0 good 2023-07-28T13:09:15.0387691+09:00 2023-07-28T13:09:15.0000000
8 Channel1.Device1.Tag1 851 good 2023-07-28T13:09:20.0176840+09:00 2023-07-28T13:09:20.0000000
9 Channel1.Device1.Tag2 0 good 2023-07-28T13:09:20.0329268+09:00 2023-07-28T13:09:20.0000000
10 Channel1.Device1.Tag1 852 good 2023-07-28T13:09:25.0070191+09:00 2023-07-28T13:09:25.0000000
11 Channel1.Device1.Tag2 0 good 2023-07-28T13:09:25.0384699+09:00 2023-07-28T13:09:25.0000000
12 Channel1.Device1.Tag1 853 good 2023-07-28T13:09:30.0109244+09:00 2023-07-28T13:09:30.0000000
13 Channel1.Device1.Tag2 0 good 2023-07-28T13:09:30.0417249+09:00 2023-07-28T13:09:30.0000000
14 Channel1.Device1.Tag1 854 good 2023-07-28T13:09:35.0118763+09:00 2023-07-28T13:09:35.0000000
15 Channel1.Device1.Tag2 0 good 2023-07-28T13:09:35.0429050+09:00 2023-07-28T13:09:35.0000000
16 Channel1.Device1.Tag1 855 good 2023-07-28T13:09:40.0027594+09:00 2023-07-28T13:09:40.0000000
17 Channel1.Device1.Tag2 0 good 2023-07-28T13:09:40.0340476+09:00 2023-07-28T13:09:40.0000000
18 Channel1.Device1.Tag1 856 good 2023-07-28T13:09:45.0029277+09:00 2023-07-28T13:09:45.0000000
19 Channel1.Device1.Tag2 0 good 2023-07-28T13:09:45.0336946+09:00 2023-07-28T13:09:45.0000000
20 Channel1.Device1.Tag1 857 good 2023-07-28T13:09:50.0153041+09:00 2023-07-28T13:09:50.0000000
21 Channel1.Device1.Tag2 0 good 2023-07-28T13:09:50.0459796+09:00 2023-07-28T13:09:50.0000000
22 Channel1.Device1.Tag1 858 good 2023-07-28T13:09:55.0103680+09:00 2023-07-28T13:09:55.0000000
23 Channel1.Device1.Tag2 0 good 2023-07-28T13:09:55.0412343+09:00 2023-07-28T13:09:55.0000000
24 Channel1.Device1.Tag1 859 good 2023-07-28T13:10:00.0095407+09:00 2023-07-28T13:10:00.0000000
25 Channel1.Device1.Tag2 0 good 2023-07-28T13:10:00.0395870+09:00 2023-07-28T13:10:00.0000000
26 Channel1.Device1.Tag1 860 good 2023-07-28T13:10:05.0069727+09:00 2023-07-28T13:10:05.0000000
27 Channel1.Device1.Tag2 0 good 2023-07-28T13:10:05.0374699+09:00 2023-07-28T13:10:05.0000000
28 Channel1.Device1.Tag1 861 good 2023-07-28T13:10:10.0113827+09:00 2023-07-28T13:10:10.0000000
29 Channel1.Device1.Tag2 0 good 2023-07-28T13:10:10.0431140+09:00 2023-07-28T13:10:10.0000000
30 Channel1.Device1.Tag1 862 good 2023-07-28T13:10:15.0024582+09:00 2023-07-28T13:10:15.0000000
31 Channel1.Device1.Tag2 0 good 2023-07-28T13:10:15.0338704+09:00 2023-07-28T13:10:15.0000000
32 Channel1.Device1.Tag2 0 good 2023-07-28T13:11:15.0338704+09:01 2023-07-28T13:11:15.0000000
33 Channel1.Device1.Tag2 0 good 2023-07-28T13:12:15.0338704+09:02 2023-07-28T13:12:15.0000000
34 Channel1.Device1.Tag2 0 good 2023-07-28T13:15:15.0338704+09:03 2023-07-28T13:15:15.0000000
35 Channel1.Device1.Tag2 0 good 2023-07-28T13:20:15.0338704+09:04 2023-07-28T13:20:15.0000000
36 Channel1.Device1.Tag2 0 good 2023-07-28T13:21:15.0338704+09:05 2023-07-28T13:21:15.0000000
37 Channel1.Device1.Tag2 0 good 2023-07-28T13:22:15.0338704+09:06 2023-07-28T13:22:15.0000000
38 Channel1.Device1.Tag2 0 good 2023-07-28T13:25:15.0338704+09:07 2023-07-28T13:25:15.0000000
39 Channel1.Device1.Tag2 878 bad 2023-07-28T13:30:15.0338704+09:08 2023-07-28T13:30:15.0000000
40 Channel1.Device1.Tag2 0 good 2023-07-28T13:31:15.0338704+09:09 2023-07-28T13:31:15.0000000
字符串
预期结果/计算:
在30分钟内采样后,结果将包含以下两个项目的数据:x1c 0d1x的数据
- 对于每个项目,示例:对于Channel1.Device1.Tag2:Last Value -30分钟窗口的第一个Value,即:
最后一行值(878):
39 Channel1.Device1.Tag2 878 bad 2023-07-28T13:30:15.0338704+09:08 2023-07-28T13:30:15.0000000
型
第一行值(0):
预期结果/数据框:
的
如所见,值878和3对应于Channel1.Device1.Tag2和Channel1.Device1.Tag1,取(最后值-第一值)之间的差:(878 - 0)和(850-847),而保留其他列值。
我目前所取得的成就:
我已经实现了,直到检索结果,但无法合并或保留groupby/apply函数后的其他列,也无法根据项进行过滤(这里= Channel1.Device1.Tag2等等)
代码:
df = df[['Item', 'Value', 'AgentStatus', 'AgentTimeStamp']].reset_index()
df
df['AgentTimeStamp'] = pd.to_datetime(df['AgentTimeStamp'])
df = df.groupby(pd.Grouper(key='AgentTimeStamp', freq='30min')).first().fillna(0).reset_index()
value_from_30_min_window_per_day = df.groupby(df['AgentTimeStamp'].dt.day)['Value'].apply(lambda x: x.values[-1]-x.values[0])
value_from_30_min_window_per_day
型
2条答案
按热度按时间lskq00tm1#
你想要什么还不清楚,也许你可以把
groupby
和resample
结合起来:字符串
输出量:
型
shstlldc2#
假设您希望每个项目的观察值有30分钟的窗口差,您可以使用自连接来实现它。这个函数适用于我的案例,所以我希望它能有所帮助:
字符串
假设您拥有以下格式的数据:
| 瓦尔|计时器| time |
| --|--| ------------ |
| 0.638129| 2019 -01- 21 00:00:00| 2023-01-01 00:00:00 |
| 047401| 2019 -01- 21 00:01:00| 2023-01-01 00:01:00 |
| 0.781685| 2023-01-01 00:02:00| 2023-01-01 00:02:00 |
| 0.839184| 2023-01-01 00:03:00| 2023-01-01 00:03:00 |
| 0.526894| 2023-01-01 00:04:00| 2023-01-01 00:04:00 |
| ……|..................||
| 0.370669| 2019 -01- 22 00:00:00| 2023-01-01 00:56:00 |
| 0.565692| 2019 -01- 22 00:00:00| 2023-01-01 00:57:00 |
| 0.121998| 2019 -01- 18 00:00:00| 2023-01-01 00:58:00 |
| 0.872352| 2019 -01- 22 00:00:00| 2023-01-01 00:59:00 |
| 0.624171| 2023-01-02 01:00:00| 2023-01-02 01:00:00 |
其中时间戳00:00和00:30中的ID 1的值如下:
| 瓦尔|计时器| time |
| --|--| ------------ |
| 0.638129| 2019 -01- 21 00:00:00| 2023-01-01 00:00:00 |
| 0.184684| 2023-01-01 00:30:00| 2023-01-01 00:30:00 |
当应用函数而不显示缺失值的行(即,没有开始时间的行,因此也没有差异)时,我们得到每个ID的观察值在时间$t$和时间$t-30$之间的差异:
型
| 时间启动|时间结束|时差|瓦尔_start|瓦尔end|瓦尔差| val_difference |
| --|--|--|--|--|--| ------------ |
| 2019 -01- 21 00:00:00| 2023-01-01 00:30:00|三十|0.638129| 0.184684| -0.453445 | -0.453445 |
| 2019 -01- 21 00:01:00| 2023-01-01 00:31:00|三十|047401| 0.613936| 0.566535| 0.566535 |
| 2023-01-01 00:02:00| 2023-01-01 00:32:00|三十|0.781685| 0.865400| 0.083714| 0.083714 |
| 2023-01-01 00:03:00| 2023-01-01 00:33:00|三十|0.839184| 089480| -0.749704 | -0.749704 |
| 2023-01-01 00:04:00| 2023-01-01 00:34:00|三十|0.526894| 0.541837| 0.014943| 0.014943 |
| ......这是什么?|......这是什么?|......这是什么?|......这是什么?|......这是什么?|......这是什么?| ... |
| 2023-01-01 00:26:00| 2019 -01- 22 00:00:00|三十|0.812442| 0.370669| -0.441773 | -0.441773 |
| 2023-01-01 00:27:00| 2019 -01- 22 00:00:00|三十|0.352933| 0.565692| 0.212759| 0.212759 |
| 2023-01-01 00:28:00| 2019 -01- 18 00:00:00|三十|186111| 0.121998| -0.064113 | -0.064113 |
| 2023-01-01 00:29:00| 2019 -01- 22 00:00:00|三十|0.671967| 0.872352| 0.200385| 0.200385 |
| 2023-01-01 00:30:00| 2023-01-02 01:00:00|三十|008082| 0.624171| 0.616090| 0.616090 |
如果我们想保留没有开始时间的行,只需将
display_na_rows
参数更改为True
:型
| 时间启动|时间结束|时差|瓦尔_start|瓦尔end|瓦尔差| val_difference |
| --|--|--|--|--|--| ------------ |
| NaT| 2019 -01- 21 00:00:00| NaN| NaN| 0.638129| NaN| NaN |
| NaT| 2019 -01- 21 00:01:00| NaN| NaN| 047401| NaN| NaN |
| NaT| 2023-01-01 00:02:00| NaN| NaN| 0.781685| NaN| NaN |
| NaT| 2023-01-01 00:03:00| NaN| NaN| 0.839184| NaN| NaN |
| NaT| 2023-01-01 00:04:00| NaN| NaN| 0.526894| NaN| NaN |
| ......这是什么?|......这是什么?|......这是什么?|......这是什么?|......这是什么?|......这是什么?| ... |
| 2023-01-01 00:26:00| 2019 -01- 22 00:00:00| 30.0| 0.812442| 0.370669| -0.441773 | -0.441773 |
| 2023-01-01 00:27:00| 2019 -01- 22 00:00:00| 30.0| 0.352933| 0.565692| 0.212759| 0.212759 |
| 2023-01-01 00:28:00| 2019 -01- 18 00:00:00| 30.0| 186111| 0.121998| -0.064113 | -0.064113 |
| 2023-01-01 00:29:00| 2019 -01- 22 00:00:00| 30.0| 0.671967| 0.872352| 0.200385| 0.200385 |
| 2023-01-01 00:30:00| 2023-01-02 01:00:00| 30.0| 008082| 0.624171| 0.616090| 0.616090 |