我有一个dataframe:
*time:unix时间,freq = 4 Hz
*信号:这里用n个方波模拟的信号(真实的信号更嘈杂/复杂)
*derr_1:用于更好地分类信号类型的信号导数。
目标
Maven需要查看此信号轨迹,并记录每个方波信号的开始/结束。
我的目标是,通过点击图上的 Dataframe /字典更新的开始和停止时间的信号。这需要工作足够快,我的总点数(80万)
示例代码
这里是用于生成示例信号的代码。
import pandas as pd
import numpy as np
import plotly.express as px
# setup the signal.
def make_time_array (start_time: float, end_time: float, freq: int=4):
time_array = np.arange(start_time, end_time, 1/freq)
return time_array
def make_signal (time_array: np.array, n: int = 5, interval: int = 400, window_size=100 ):
# calculate the number of intervals in the time array
num_intervals = int(np.floor((time_array.max() - time_array.min()) / interval))
print (num_intervals)
# select `n` random intervals from the time array
random_intervals = np.random.choice(num_intervals, n, replace=False)
# calculate the timestamps for the selected intervals
selected_points = time_array.min() + (random_intervals * interval)
# round the timestamps to match the frequency of the time array
selected_points = np.round(selected_points, decimals=3)
# generate the signal
n_points = len(time_array)
signal = np.random.normal(0, 0.1, n_points)
# set the signal to be equal to `signal + 2` for `+-window_size` seconds around the selected points
for idx, point in enumerate(selected_points):
start_idx = np.argmin(np.abs(time_array - (point - window_size)))
end_idx = np.argmin(np.abs(time_array - (point + window_size)))
signal[start_idx:end_idx+1] += 2
return signal
# main
start_time = 1625619679.0 #unix start time
total_time = 5000.25 # actual time (but then runs slow) = 202120.25
end_time = start_time + total_time
freq = 4
number_of_square_waves = 5
distance_between_square_waves = 200
size_of_square_wave = 40
time_array = make_time_array(start_time=start_time, end_time=end_time, freq=freq)
print (f"lenght time array {len(time_array)}, actual lenght = 202120*4 = 800000")
signal = make_signal(time_array=time_array, n=number_of_square_waves, interval=distance_between_square_waves, window_size=size_of_square_wave )
# make the dataframe.
df = pd.DataFrame({"time": time_array, "signal": signal})
df['derr_1'] = df['signal'].diff() * freq
# plot the data
df_plot = df.melt(id_vars='time')
fig = px.line(df_plot, x='time', y='value', facet_row='variable')
fig.update_layout(title = "signal example: an expert has to visually inspect and select Start time/ End time for all the 'signals' present. ")
fig.update_yaxes(matches=None)
fig.for_each_yaxis(lambda yaxis: yaxis.update(showticklabels=True))
fig.show()
预期输出
具有列的 Dataframe :
- start:当信号开始时(在本例中为5个条目)
- stop:当信号停止时(在这种情况下为5个条目)
在这种情况下,开始和停止时间戳将正好是导数的时间戳(但在真实的数据中,它并不那么简单)。开始/停止数据应该对应于Maven在图上点击的位置。
1条答案
按热度按时间6vl6ewon1#
这里有一个简单但可行的想法。使用导数的极值来识别开始和停止信号,然后假设它们以交替的顺序出现,并将它们连接起来形成预期表: