pandas 选择满足条件并连续出现10秒的行

sqougxex  于 2023-09-29  发布在  其他
关注(0)|答案(2)|浏览(168)

下面是一些时间序列数据:

Truck                       Timestamp           b2_v_fult2_tms_err  tm2_b_fault_level_a tm2_b_fault_code_a
0   251_EV01_PB_579_HD_756828   2023-05-01 13:11:48 NaN                 2.0                 30.0                    
1   251_EV01_PB_579_HD_756828   2023-05-01 13:11:51 1.0                 0.0                 0.0                 
2   251_EV01_PB_579_HD_756828   2023-05-01 13:11:52 1.0                 0.0                 0.0                 
3   251_EV01_PB_579_HD_756828   2023-05-01 13:11:53 1.0                 0.0                 0.0                 
4   251_EV01_PB_579_HD_756828   2023-05-01 13:11:54 1.0                 0.0                 0.0                 
5   251_EV01_PB_579_HD_756828   2023-05-01 13:11:55 1.0                 0.0                 0.0                 
6   251_EV01_PB_579_HD_756828   2023-05-01 13:11:56 1.0                 0.0                 0.0                 
7   251_EV01_PB_579_HD_756828   2023-05-01 13:11:57 1.0                 0.0                 0.0                 
8   251_EV01_PB_579_HD_756828   2023-05-01 13:11:58 1.0                 0.0                 0.0                 
9   251_EV01_PB_579_HD_756828   2023-05-01 13:11:59 1.0                 0.0                 0.0                 
10  251_EV01_PB_579_HD_756828   2023-05-01 13:12:00 1.0                 0.0                 0.0                 
11  251_EV01_PB_579_HD_756828   2023-05-01 14:58:05 1.0                 0.0                 0.0                 
12  251_EV01_PB_579_HD_756828   2023-05-01 14:58:10 0.0                 0.0                 0.0                 
13  251_EV01_PB_579_HD_756828   2023-05-01 14:58:11 0.0                 0.0                 0.0                 
14  251_EV01_PB_579_HD_756828   2023-05-01 14:58:15 0.0                 0.0                 0.0                 
15  251_EV01_PB_579_HD_756828   2023-05-01 14:58:16 0.0                 0.0                 0.0                 
16  251_EV01_PB_579_HD_756828   2023-05-01 14:58:20 0.0                 0.0                 0.0                 
17  251_EV01_PB_579_HD_756828   2023-05-01 14:58:21 0.0                 0.0                 0.0                 
18  251_EV01_PB_579_HD_756828   2023-05-01 14:58:26 0.0                 0.0                 0.0                 
19  251_EV01_PB_579_HD_756828   2023-05-01 14:58:27 0.0                 0.0                 0.0                 
20  251_EV01_PB_579_HD_756828   2023-05-01 14:58:31 0.0                 0.0                 0.0

我试图完成的是找到“b2_v_fult2_tms_err == 1.0”,“tm2_b_fault_level_a == 0.0”,“tm2_b_fault_code_a == 0.0”的行,创建一个名为“bin”的新列,并在“bin”列中插入一个“True”值,该值位于连续10秒出现三个值(“b2_v_fult2_tms_err == 1.0”,“tm2_b_fault_level_a == 0.0”,“tm2_b_fault_code_a == 0.0”)的行旁边。“bin”列中的所有其他行可以留空或标记为“False”。
这看起来像是:

Truck                       Timestamp           b2_v_fult2_tms_err  tm2_b_fault_level_a tm2_b_fault_code_a  bin
0   251_EV01_PB_579_HD_756828   2023-05-01 13:11:48 NaN                 2.0                 30.0                False
1   251_EV01_PB_579_HD_756828   2023-05-01 13:11:51 1.0                 0.0                 0.0                 True
2   251_EV01_PB_579_HD_756828   2023-05-01 13:11:52 1.0                 0.0                 0.0                 True
3   251_EV01_PB_579_HD_756828   2023-05-01 13:11:53 1.0                 0.0                 0.0                 True
4   251_EV01_PB_579_HD_756828   2023-05-01 13:11:54 1.0                 0.0                 0.0                 True
5   251_EV01_PB_579_HD_756828   2023-05-01 13:11:55 1.0                 0.0                 0.0                 True
6   251_EV01_PB_579_HD_756828   2023-05-01 13:11:56 1.0                 0.0                 0.0                 True
7   251_EV01_PB_579_HD_756828   2023-05-01 13:11:57 1.0                 0.0                 0.0                 True
8   251_EV01_PB_579_HD_756828   2023-05-01 13:11:58 1.0                 0.0                 0.0                 True
9   251_EV01_PB_579_HD_756828   2023-05-01 13:11:59 1.0                 0.0                 0.0                 True
10  251_EV01_PB_579_HD_756828   2023-05-01 13:12:00 1.0                 0.0                 0.0                 True
11  251_EV01_PB_579_HD_756828   2023-05-01 14:58:05 1.0                 0.0                 0.0                 False
12  251_EV01_PB_579_HD_756828   2023-05-01 14:58:10 0.0                 0.0                 0.0                 False
13  251_EV01_PB_579_HD_756828   2023-05-01 14:58:11 0.0                 0.0                 0.0                 False
14  251_EV01_PB_579_HD_756828   2023-05-01 14:58:15 0.0                 0.0                 0.0                 False
15  251_EV01_PB_579_HD_756828   2023-05-01 14:58:16 0.0                 0.0                 0.0                 False
16  251_EV01_PB_579_HD_756828   2023-05-01 14:58:20 0.0                 0.0                 0.0                 False
17  251_EV01_PB_579_HD_756828   2023-05-01 14:58:21 0.0                 0.0                 0.0                 False
18  251_EV01_PB_579_HD_756828   2023-05-01 14:58:26 0.0                 0.0                 0.0                 False
19  251_EV01_PB_579_HD_756828   2023-05-01 14:58:27 0.0                 0.0                 0.0                 False
20  251_EV01_PB_579_HD_756828   2023-05-01 14:58:31 0.0                 0.0                 0.0                 False

我使用bin1_df = results_df.loc[(results_df['b2_v_fult2_tms_err'] == 1.0) & (results_df['tm2_b_fault_level_a'] == 0.0) & (results_df['tm2_b_fault_code_a'] == 0.0) ].reset_index(drop=True)过滤数据,然后尝试使用bin1_df['time_diff'] = bin1_df['Timestamp'].diff()计算时间戳之间的差异,这是成功的。但是我很难弄清楚如何将10次出现的1转换为“True”值。之后,我使用m = bin1_df['time_diff'] == pd.Timedelta(seconds=1)过滤结果,以显示时间增量为1秒的行,然后使用bin1_df = bin1_df[m.shift(-1)|m]显示之前的行,因为它会忽略第一行。
我也尝试过其他一些事情,但没有结果,但上面的尝试是我能得到的最接近的。
有人知道如何做到这一点吗?它不一定是这个结果,但可以清楚地传达这些条件满足了10秒的东西是我最终寻找的。

juzqafwq

juzqafwq1#

IIUC,您可以:

m1 = df["b2_v_fult2_tms_err"] == 1.0
m2 = df["tm2_b_fault_level_a"] == 0.0
m3 = df["tm2_b_fault_code_a"] == 0.0

m = m1 & m2 & m3

df["bin"] = df.groupby((m != m.shift()).cumsum())["Timestamp"].transform(
    lambda g: ([True] * 10 + [False] * (len(g) - 10))
    if len(g) >= 10
    else [False] * len(g)
)

print(df)

图纸:

Truck            Timestamp  b2_v_fult2_tms_err  tm2_b_fault_level_a  tm2_b_fault_code_a    bin
0   251_EV01_PB_579_HD_756828  2023-05-01 13:11:48                 NaN                  2.0                30.0  False
1   251_EV01_PB_579_HD_756828  2023-05-01 13:11:51                 1.0                  0.0                 0.0   True
2   251_EV01_PB_579_HD_756828  2023-05-01 13:11:52                 1.0                  0.0                 0.0   True
3   251_EV01_PB_579_HD_756828  2023-05-01 13:11:53                 1.0                  0.0                 0.0   True
4   251_EV01_PB_579_HD_756828  2023-05-01 13:11:54                 1.0                  0.0                 0.0   True
5   251_EV01_PB_579_HD_756828  2023-05-01 13:11:55                 1.0                  0.0                 0.0   True
6   251_EV01_PB_579_HD_756828  2023-05-01 13:11:56                 1.0                  0.0                 0.0   True
7   251_EV01_PB_579_HD_756828  2023-05-01 13:11:57                 1.0                  0.0                 0.0   True
8   251_EV01_PB_579_HD_756828  2023-05-01 13:11:58                 1.0                  0.0                 0.0   True
9   251_EV01_PB_579_HD_756828  2023-05-01 13:11:59                 1.0                  0.0                 0.0   True
10  251_EV01_PB_579_HD_756828  2023-05-01 13:12:00                 1.0                  0.0                 0.0   True
11  251_EV01_PB_579_HD_756828  2023-05-01 14:58:05                 1.0                  0.0                 0.0  False
12  251_EV01_PB_579_HD_756828  2023-05-01 14:58:10                 0.0                  0.0                 0.0  False
13  251_EV01_PB_579_HD_756828  2023-05-01 14:58:11                 0.0                  0.0                 0.0  False
14  251_EV01_PB_579_HD_756828  2023-05-01 14:58:15                 0.0                  0.0                 0.0  False
15  251_EV01_PB_579_HD_756828  2023-05-01 14:58:16                 0.0                  0.0                 0.0  False
16  251_EV01_PB_579_HD_756828  2023-05-01 14:58:20                 0.0                  0.0                 0.0  False
17  251_EV01_PB_579_HD_756828  2023-05-01 14:58:21                 0.0                  0.0                 0.0  False
18  251_EV01_PB_579_HD_756828  2023-05-01 14:58:26                 0.0                  0.0                 0.0  False
19  251_EV01_PB_579_HD_756828  2023-05-01 14:58:27                 0.0                  0.0                 0.0  False
20  251_EV01_PB_579_HD_756828  2023-05-01 14:58:31                 0.0                  0.0                 0.0  False
jum4pzuy

jum4pzuy2#

这里的主要困难是找到n个连续出现的1秒差异。
首先在秒级上获得差异:

# create a mask based on your filter
df['mask'] = (df['b2_v_fult2_tms_err'] == 1) & (df['tm2_b_fault_code_a']  == 0) & (df['tm2_b_fault_level_a'] ==0)

# get sec difference in the Timestamp column
df['time_diff'] = df['Timestamp'].diff().dt.seconds

接下来,确定连续且在阈值之下的时间步长

threshold = 9 # set your threshold for consecutive secs here
# this will identify consecutive occurences of same value (1 second in this example)
df['bin'] = df.time_diff.groupby((df.time_diff != df.time_diff.shift()).cumsum()).transform('size') * df.time_diff
df['bin'] = (df.bin >= threshold).astype(int)
df['bin'] = df['bin'] * df['mask'] # consecutive time and satisfy filter

# i've added helper columns for readability but they are not a necessity
cols_to_keep = ['Truck', 'Timestamp', 'b2_v_fult2_tms_err', 'tm2_b_fault_level_a', 'tm2_b_fault_code_a', 'bin']

df[cols_to_keep]

输出值:
| 卡车|时间戳|b2_v_fult2_tms_err| tm2_b_fault_level_a| tm2_b_fault_code_a|斌|
| --|--|--|--|--|--|
| 251_EV01_PB_579_HD_756828| 2023-05-01 13:11:48|楠| 2 | 30 | 0 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 13:11:51| 1 | 0 | 0 | 0 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 13:11:52| 1 | 0 | 0 | 1 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 13:11:53| 1 | 0 | 0 | 1 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 13:11:54| 1 | 0 | 0 | 1 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 13:11:55| 1 | 0 | 0 | 1 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 13:11:56| 1 | 0 | 0 | 1 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 13:11:57| 1 | 0 | 0 | 1 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 13:11:58| 1 | 0 | 0 | 1 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 13:11:59| 1 | 0 | 0 | 1 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 13:12:00| 1 | 0 | 0 | 1 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 14:58:05| 1 | 0 | 0 | 1 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 14:58:10| 0 | 0 | 0 | 0 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 14:58:11| 0 | 0 | 0 | 0 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 14:58:15| 0 | 0 | 0 | 0 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 14:58:16| 0 | 0 | 0 | 0 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 14:58:20| 0 | 0 | 0 | 0 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 14:58:21| 0 | 0 | 0 | 0 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 14:58:26| 0 | 0 | 0 | 0 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 14:58:27| 0 | 0 | 0 | 0 |
| 251_EV01_PB_579_HD_756828| 2023-05-01 14:58:31| 0 | 0 | 0 | 0 |

相关问题