pandas Python:通过单击plot将数据导入dataframe

pdtvr36n  于 2023-04-28  发布在  Python
关注(0)|答案(1)|浏览(133)

我有一个dataframe:

*time:unix时间,freq = 4 Hz
*信号:这里用n个方波模拟的信号(真实的信号更嘈杂/复杂)
*derr_1:用于更好地分类信号类型的信号导数。

目标

Maven需要查看此信号轨迹,并记录每个方波信号的开始/结束。
我的目标是,通过点击图上的 Dataframe /字典更新的开始和停止时间的信号。这需要工作足够快,我的总点数(80万)

示例代码

这里是用于生成示例信号的代码。

import pandas as pd 
import numpy as np 
import plotly.express as px

# setup the signal. 

def make_time_array (start_time: float, end_time: float, freq: int=4): 
    time_array = np.arange(start_time, end_time, 1/freq)
    return time_array

def make_signal (time_array: np.array, n: int = 5, interval: int = 400, window_size=100 ): 

    # calculate the number of intervals in the time array
    num_intervals = int(np.floor((time_array.max() - time_array.min()) / interval))
    print (num_intervals)

    # select `n` random intervals from the time array
    random_intervals = np.random.choice(num_intervals, n, replace=False)

    # calculate the timestamps for the selected intervals
    selected_points = time_array.min() + (random_intervals * interval)

    # round the timestamps to match the frequency of the time array
    selected_points = np.round(selected_points, decimals=3)

    # generate the signal
    n_points = len(time_array)
    signal = np.random.normal(0, 0.1, n_points)

    # set the signal to be equal to `signal + 2` for `+-window_size` seconds around the selected points
    for idx, point in enumerate(selected_points):
        start_idx = np.argmin(np.abs(time_array - (point - window_size)))
        end_idx = np.argmin(np.abs(time_array - (point + window_size)))
        signal[start_idx:end_idx+1] += 2
    return signal



 # main 
start_time = 1625619679.0 #unix start time
total_time = 5000.25 # actual time (but then runs slow) = 202120.25
end_time = start_time +  total_time
freq = 4

number_of_square_waves = 5
distance_between_square_waves = 200
size_of_square_wave = 40 

time_array = make_time_array(start_time=start_time, end_time=end_time, freq=freq)
print (f"lenght time array {len(time_array)}, actual lenght = 202120*4 = 800000")
signal = make_signal(time_array=time_array, n=number_of_square_waves, interval=distance_between_square_waves, window_size=size_of_square_wave )

 # make the dataframe. 
df = pd.DataFrame({"time": time_array, "signal": signal})
df['derr_1'] = df['signal'].diff() * freq

 # plot the data 
df_plot = df.melt(id_vars='time')
fig = px.line(df_plot, x='time', y='value', facet_row='variable')
fig.update_layout(title = "signal example: an expert has to visually inspect and select Start time/ End time for all the 'signals' present. ")
fig.update_yaxes(matches=None)
fig.for_each_yaxis(lambda yaxis: yaxis.update(showticklabels=True))
fig.show()

预期输出

具有列的 Dataframe :

  • start:当信号开始时(在本例中为5个条目)
  • stop:当信号停止时(在这种情况下为5个条目)

在这种情况下,开始和停止时间戳将正好是导数的时间戳(但在真实的数据中,它并不那么简单)。开始/停止数据应该对应于Maven在图上点击的位置。

6vl6ewon

6vl6ewon1#

这里有一个简单但可行的想法。使用导数的极值来识别开始和停止信号,然后假设它们以交替的顺序出现,并将它们连接起来形成预期表:

max_derr = df['derr_1'].max()
min_derr = df['derr_1'].min()
start_signals = df[df['derr_1']>max_derr/2]
end_signals = df[df['derr_1']<min_derr/2]
start_times = start_signals['time'].values
end_times = end_signals['time'].values
df_signals = pd.DataFrame({'start_timestamp': start_times, 'stop_timestamp': end_times})
df_signals

相关问题