python 找到从第一个大槽到下一个温度数据的波的持续时间

xfyts7mz  于 2023-04-28  发布在  Python
关注(0)|答案(2)|浏览(89)

我有一种算法来检测温度系列数据的谷和峰,但它需要一些改进

这是完整数据图

此为放大版本

如图所示,算法返回甚至轻微的,尽管如此,计算从第一个到下一个的持续时间返回噪声数据
所以我需要完善的算法,只考虑主要通过例如

这是我正在使用的算法

import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt

# Read the data from the CSV file
df = pd.read_csv('Nusrat.csv')

# Convert the first column to datetime format
df['column1'] = pd.to_datetime(df['column1'])

# Convert the second column to numeric type
df['column2'] = df['column2'].astype(int)

time_data = df['time']

# df['Column1'] = pd.to_datetime(df['Column1'])

period = 3

dn = [i for i in range(period, len(df) - period - 1) if
      (df.loc[i, 'column2'] < df.loc[i - period:i - 1, 'column2']).all() == True
      and (df.loc[i, 'column2'] <= df.loc[i + 1:i + period, 'column2']).all() == True]

up = [i for i in range(period, len(df) - period - 1) if
      (df.loc[i, 'column2'] > df.loc[i - period:i - 1, 'column2']).all() == True
      and (df.loc[i, 'column2'] >= df.loc[i + 1:i + period, 'column2']).all() == True]

fig, ax = plt.subplots()
ax.plot(df['column1'], df['column2'])
ax.plot(df.loc[dn, 'column1'], df.loc[dn, 'column2'], 'o', color='green', markersize=5)
ax.plot(df.loc[up, 'column1'], df.loc[up, 'column2'], 'o', color='red', markersize=5)
fig.autofmt_xdate()
plt.show()

difference = df.loc[dn, 'column1'].diff()
def time_difference(start_index, end_index):
    start_time = datetime.strptime(time_data[start_index], '%H:%M:%S.%f')
    end_time = datetime.strptime(time_data[end_index], '%H:%M:%S.%f')
    time_delta = end_time - start_time
    return time_delta
# print(dn)
# print(dn[0],"--",dn[1])

for i in range(len(dn)-1):
#     print(dn[i], "--", dn[i+1])
    print(df['column1'][dn[i]],(df['column1'][dn[i+1]]))
    print(time_difference(dn[i], dn[i+1]))

这是文本中的数据集,因为我不能在这里提供文件,它不完整的数据

Column1,Column2,Time
2023-03-14 14:00:59.0,195.80,14:00:59.0
2023-03-14 14:02:06.0,174.20,14:02:06.0
2023-03-14 14:03:14.0,156.76,14:03:14.0
2023-03-14 14:04:21.0,142.36,14:04:21.0
2023-03-14 14:05:29.0,131.00,14:05:29.0
2023-03-14 14:06:37.0,122.00,14:06:37.0
2023-03-14 14:07:44.0,114.91,14:07:44.0
2023-03-14 14:08:52.0,109.18,14:08:52.0
2023-03-14 14:10:00.0,104.56,14:10:00.0
2023-03-14 14:11:07.0,100.74,14:11:07.0
2023-03-14 14:12:15.0,97.93,14:12:15.0
2023-03-14 14:13:22.0,95.45,14:13:22.0
2023-03-14 14:14:30.0,93.43,14:14:30.0
2023-03-14 14:15:37.0,91.85,14:15:37.0
2023-03-14 14:16:45.0,90.73,14:16:45.0
2023-03-14 14:17:53.0,89.49,14:17:53.0
2023-03-14 14:19:00.0,88.59,14:19:00.0
2023-03-14 14:20:08.0,87.91,14:20:08.0
2023-03-14 14:21:15.0,87.13,14:21:15.0
2023-03-14 14:22:23.0,86.68,14:22:23.0
2023-03-14 14:23:30.0,86.23,14:23:30.0
2023-03-14 14:24:38.0,86.23,14:24:38.0
2023-03-14 14:25:45.0,108.61,14:25:45.0
2023-03-14 14:26:53.0,142.70,14:26:53.0
2023-03-14 14:28:01.0,175.89,14:28:01.0
2023-03-14 14:29:08.0,203.79,14:29:08.0
2023-03-14 14:30:16.0,225.84,14:30:16.0
2023-03-14 14:31:23.0,241.25,14:31:23.0
2023-03-14 14:32:31.0,253.29,14:32:31.0
2023-03-14 14:33:39.0,262.18,14:33:39.0
2023-03-14 14:34:46.0,262.29,14:34:46.0
2023-03-14 14:35:54.0,262.29,14:35:54.0
2023-03-14 14:37:01.0,262.29,14:37:01.0
2023-03-14 14:38:09.0,260.83,14:38:09.0
2023-03-14 14:39:16.0,235.51,14:39:16.0
2023-03-14 14:40:24.0,208.85,14:40:24.0
2023-03-14 14:41:31.0,185.45,14:41:31.0
2023-03-14 14:42:39.0,166.33,14:42:39.0
2023-03-14 14:43:46.0,150.35,14:43:46.0
2023-03-14 14:44:54.0,137.41,14:44:54.0
2023-03-14 14:46:01.0,127.06,14:46:01.0
2023-03-14 14:47:09.0,118.96,14:47:09.0
2023-03-14 14:48:17.0,112.55,14:48:17.0
2023-03-14 14:49:24.0,107.15,14:49:24.0
2023-03-14 14:50:32.0,103.10,14:50:32.0
2023-03-14 14:51:39.0,99.61,14:51:39.0
2023-03-14 14:52:47.0,96.80,14:52:47.0
2023-03-14 14:53:54.0,94.55,14:53:54.0
2023-03-14 14:55:02.0,92.75,14:55:02.0
2023-03-14 14:56:09.0,91.18,14:56:09.0
2023-03-14 14:57:17.0,97.70,14:57:17.0
2023-03-14 14:58:24.0,127.06,14:58:24.0
2023-03-14 14:59:32.0,161.04,14:59:32.0
2023-03-14 15:00:39.0,190.85,15:00:39.0
2023-03-14 15:01:47.0,214.81,15:01:47.0
2023-03-14 15:02:55.0,233.38,15:02:55.0
2023-03-14 15:04:02.0,247.21,15:04:02.0
2023-03-14 15:05:10.0,256.66,15:05:10.0
2023-03-14 15:06:17.0,262.29,15:06:17.0
2023-03-14 15:07:25.0,262.29,15:07:25.0
2023-03-14 15:08:32.0,262.29,15:08:32.0
2023-03-14 15:09:40.0,262.29,15:09:40.0
2023-03-14 15:10:47.0,262.29,15:10:47.0
2023-03-14 15:11:55.0,246.31,15:11:55.0
2023-03-14 15:13:02.0,219.65,15:13:02.0
2023-03-14 15:14:10.0,194.56,15:14:10.0
2023-03-14 15:15:17.0,173.53,15:15:17.0
2023-03-14 15:16:25.0,156.43,15:16:25.0
2023-03-14 15:17:33.0,142.03,15:17:33.0
2023-03-14 15:18:40.0,130.78,15:18:40.0
2023-03-14 15:19:48.0,121.89,15:19:48.0
2023-03-14 15:20:55.0,114.80,15:20:55.0
2023-03-14 15:22:03.0,109.18,15:22:03.0
2023-03-14 15:23:10.0,104.68,15:23:10.0
2023-03-14 15:24:18.0,101.19,15:24:18.0
2023-03-14 15:25:25.0,98.26,15:25:25.0
2023-03-14 15:26:33.0,95.90,15:26:33.0
2023-03-14 15:27:41.0,93.88,15:27:41.0
2023-03-14 15:28:48.0,92.41,15:28:48.0
2023-03-14 15:29:56.0,91.06,15:29:56.0
2023-03-14 15:31:03.0,89.94,15:31:03.0
2023-03-14 15:32:11.0,89.04,15:32:11.0
2023-03-14 15:33:18.0,88.03,15:33:18.0
2023-03-14 15:34:26.0,87.35,15:34:26.0
2023-03-14 15:35:33.0,86.79,15:35:33.0
2023-03-14 15:36:41.0,86.34,15:36:41.0
2023-03-14 15:37:49.0,86.34,15:37:49.0
2023-03-14 15:38:56.0,108.39,15:38:56.0
2023-03-14 15:40:04.0,142.59,15:40:04.0
2023-03-14 15:41:11.0,175.33,15:41:11.0
2023-03-14 15:42:19.0,203.00,15:42:19.0
2023-03-14 15:43:26.0,224.94,15:43:26.0
2023-03-14 15:44:34.0,240.91,15:44:34.0
2023-03-14 15:45:41.0,252.39,15:45:41.0
2023-03-14 15:46:49.0,260.71,15:46:49.0
2023-03-14 15:47:56.0,262.29,15:47:56.0
2023-03-14 15:49:04.0,262.29,15:49:04.0
2023-03-14 15:50:11.0,262.29,15:50:11.0
2023-03-14 15:51:19.0,259.14,15:51:19.0
2023-03-14 15:52:26.0,233.60,15:52:26.0
2023-03-14 15:53:34.0,207.39,15:53:34.0
2023-03-14 15:54:41.0,183.99,15:54:41.0
2023-03-14 15:55:49.0,164.98,15:55:49.0
2023-03-14 15:56:57.0,149.00,15:56:57.0
2023-03-14 15:58:04.0,136.06,15:58:04.0
2023-03-14 15:59:12.0,125.94,15:59:12.0
2023-03-14 16:00:19.0,117.84,16:00:19.0
2023-03-14 16:01:27.0,111.43,16:01:27.0
2023-03-14 16:02:35.0,106.25,16:02:35.0
2023-03-14 16:03:42.0,102.31,16:03:42.0
2023-03-14 16:04:50.0,98.94,16:04:50.0
2023-03-14 16:05:57.0,96.35,16:05:57.0
2023-03-14 16:07:05.0,95.34,16:07:05.0
2023-03-14 16:08:12.0,117.84,16:08:12.0
2023-03-14 16:09:20.0,150.91,16:09:20.0
2023-03-14 16:10:27.0,183.09,16:10:27.0
2023-03-14 16:11:35.0,209.30,16:11:35.0
2023-03-14 16:12:42.0,229.33,16:12:42.0
2023-03-14 16:13:50.0,244.29,16:13:50.0
2023-03-14 16:14:57.0,255.65,16:14:57.0
2023-03-14 16:16:05.0,262.29,16:16:05.0
2023-03-14 16:17:13.0,262.29,16:17:13.0
2023-03-14 16:18:20.0,262.29,16:18:20.0
2023-03-14 16:19:28.0,262.29,16:19:28.0
2023-03-14 16:20:35.0,255.43,16:20:35.0
2023-03-14 16:21:43.0,229.44,16:21:43.0
2023-03-14 16:22:51.0,203.56,16:22:51.0
2023-03-14 16:23:58.0,181.06,16:23:58.0
2023-03-14 16:25:06.0,162.84,16:25:06.0
2023-03-14 16:26:13.0,147.54,16:26:13.0
2023-03-14 16:27:21.0,135.28,16:27:21.0
2023-03-14 16:28:28.0,125.60,16:28:28.0
2023-03-14 16:29:36.0,118.06,16:29:36.0
2023-03-14 16:30:43.0,111.76,16:30:43.0
2023-03-14 16:31:51.0,106.81,16:31:51.0
2023-03-14 16:32:59.0,102.88,16:32:59.0
2023-03-14 16:34:06.0,99.73,16:34:06.0
2023-03-14 16:35:14.0,97.25,16:35:14.0
2023-03-14 16:36:22.0,95.23,16:36:22.0
2023-03-14 16:37:29.0,93.54,16:37:29.0
2023-03-14 16:38:37.0,92.08,16:38:37.0
2023-03-14 16:39:44.0,90.84,16:39:44.0
2023-03-14 16:40:52.0,89.94,16:40:52.0
2023-03-14 16:42:00.0,89.04,16:42:00.0
2023-03-14 16:43:07.0,88.36,16:43:07.0
2023-03-14 16:44:15.0,87.80,16:44:15.0
2023-03-14 16:45:22.0,87.13,16:45:22.0
2023-03-14 16:46:30.0,86.68,16:46:30.0
2023-03-14 16:47:37.0,99.95,16:47:37.0
2023-03-14 16:48:45.0,132.58,16:48:45.0
2023-03-14 16:49:52.0,166.55,16:49:52.0
2023-03-14 16:51:00.0,195.58,16:51:00.0
2023-03-14 16:52:07.0,219.31,16:52:07.0
2023-03-14 16:53:15.0,236.86,16:53:15.0
2023-03-14 16:54:23.0,249.80,16:54:23.0
2023-03-14 16:55:30.0,259.93,16:55:30.0
2023-03-14 16:56:38.0,262.29,16:56:38.0
2023-03-14 16:57:45.0,262.29,16:57:45.0
2023-03-14 16:58:53.0,262.29,16:58:53.0
2023-03-14 17:00:00.0,262.29,17:00:00.0
2023-03-14 17:01:08.0,262.29,17:01:08.0
2023-03-14 17:02:15.0,262.29,17:02:15.0
2023-03-14 17:03:23.0,262.29,17:03:23.0
2023-03-14 17:04:31.0,262.29,17:04:31.0
2023-03-14 17:05:38.0,256.66,17:05:38.0
2023-03-14 17:06:46.0,229.10,17:06:46.0
2023-03-14 17:07:53.0,202.89,17:07:53.0
2023-03-14 17:09:01.0,180.28,17:09:01.0
2023-03-14 17:10:08.0,161.94,17:10:08.0
2023-03-14 17:11:16.0,147.09,17:11:16.0
2023-03-14 17:12:24.0,134.94,17:12:24.0
2023-03-14 17:13:31.0,125.38,17:13:31.0
2023-03-14 17:14:39.0,117.84,17:14:39.0
2023-03-14 17:15:46.0,111.88,17:15:46.0
2023-03-14 17:16:54.0,107.26,17:16:54.0
2023-03-14 17:18:02.0,103.33,17:18:02.0
2023-03-14 17:19:09.0,100.18,17:19:09.0
2023-03-14 17:20:17.0,97.70,17:20:17.0
2023-03-14 17:21:24.0,95.79,17:21:24.0
2023-03-14 17:22:32.0,94.10,17:22:32.0
2023-03-14 17:23:40.0,92.75,17:23:40.0
2023-03-14 17:24:47.0,91.74,17:24:47.0
2023-03-14 17:25:55.0,90.61,17:25:55.0
2023-03-14 17:27:02.0,89.83,17:27:02.0
2023-03-14 17:28:10.0,89.04,17:28:10.0
2023-03-14 17:29:17.0,88.59,17:29:17.0
2023-03-14 17:30:25.0,88.03,17:30:25.0
2023-03-14 17:31:32.0,87.69,17:31:32.0
2023-03-14 17:32:40.0,87.24,17:32:40.0
2023-03-14 17:33:47.0,86.90,17:33:47.0
2023-03-14 17:34:55.0,86.56,17:34:55.0
2023-03-14 17:36:03.0,86.23,17:36:03.0
2023-03-14 17:37:10.0,85.89,17:37:10.0
2023-03-14 17:38:18.0,85.66,17:38:18.0
2023-03-14 17:39:25.0,85.44,17:39:25.0
2023-03-14 17:40:33.0,85.21,17:40:33.0
2023-03-14 17:41:40.0,85.10,17:41:40.0
2023-03-14 17:42:48.0,92.30,17:42:48.0
2023-03-14 17:43:55.0,121.89,17:43:55.0
2023-03-14 17:45:03.0,156.65,17:45:03.0
2023-03-14 17:46:11.0,187.48,17:46:11.0
2023-03-14 17:47:18.0,212.34,17:47:18.0
2023-03-14 17:48:26.0,231.24,17:48:26.0
2023-03-14 17:49:33.0,245.41,17:49:33.0

我想要的结果是一个数组,告诉开始时间和结束时间的波与其持续时间,因为它已经工作

结果

2023-04-03 23:20:09 2023-04-03 23:29:09
0:09:00
2023-04-03 23:29:09 2023-04-03 23:48:17
0:19:08
2023-04-03 23:48:17 2023-04-04 00:06:19
-1 day, 0:18:02
2023-04-04 00:06:19 2023-04-04 00:58:07
0:51:48
2023-04-04 00:58:07 2023-04-04 01:16:08
0:18:01
2023-04-04 01:16:08 2023-04-04 01:30:47
0:14:39
2023-04-04 01:30:47 2023-04-04 01:42:07
0:11:20
2023-04-04 01:42:07 2023-04-04 01:59:01
0:16:54
2023-04-04 01:59:01 2023-04-04 02:21:32
0:22:31
2023-04-04 02:21:32 2023-04-04 02:30:33
0:09:01
2023-04-04 02:30:33 2023-04-04 02:40:42
0:10:09
2023-04-04 02:40:42 2023-04-04 03:02:07
0:21:25
2023-04-04 03:02:07 2023-04-04 03:34:47
0:32:40
2023-04-04 03:34:47 2023-04-04 03:44:54
0:10:07
2023-04-04 03:44:54 2023-04-04 04:01:48
0:16:54
2023-04-04 04:01:48 2023-04-04 04:15:18
0:13:30
2023-04-04 04:15:18 2023-04-04 04:41:12
0:25:54
2023-04-04 04:41:12 2023-04-04 04:59:14
0:18:02
2023-04-04 04:59:14 2023-04-04 05:19:30
0:20:16
2023-04-04 05:19:30 2023-04-04 05:36:24
0:16:54
2023-04-04 05:36:24 2023-04-04 06:15:49
0:39:25
2023-04-04 06:15:49 2023-04-04 06:33:49
0:18:00
2023-04-04 06:33:49 2023-04-04 06:46:13
0:12:24
2023-04-04 06:46:13 2023-04-04 07:09:52
0:23:39
2023-04-04 07:09:52 2023-04-04 07:27:53
0:18:01
2023-04-04 07:27:53 2023-04-04 07:47:02
0:19:09
2023-04-04 07:47:02 2023-04-04 08:03:56
0:16:54
2023-04-04 08:03:56 2023-04-04 08:14:03
0:10:07
2023-04-04 08:14:03 2023-04-04 08:27:34
0:13:31
2023-04-04 08:27:34 2023-04-04 08:48:58
0:21:24
2023-04-04 08:48:58 2023-04-04 09:07:00
0:18:02
2023-04-04 09:07:00 2023-04-04 09:43:15
0:36:15
2023-04-04 09:43:15 2023-04-04 10:04:38
0:21:23
2023-04-04 10:04:38 2023-04-04 10:24:59
0:20:21
2023-04-04 10:24:59 2023-04-04 10:40:45
0:15:46
uqcuzwp8

uqcuzwp81#

我建议你看看scipy库,它在处理信号处理时非常棒,尤其是signal模块。
首先,它将帮助你最小化和简化你的代码很多,通过使用find_peaks方法,例如(https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html)找到局部最小值和极值。
以下是您可以从问题中提供的数据样本中得到的信息:

import pandas as pd
from matplotlib import pyplot as plt
from scipy.signal import find_peaks

fig, ax = plt.subplots()
input = pd.read_csv("test.csv")

# List indices of signal peaks
peaks, _ = find_peaks(input["Column2"].values, height=0)
neg_peaks, _ = find_peaks(1/input["Column2"].values, height=0)

# Create columns in pandas DataFrame specifying which points are peaks
input["peaks"] = input.apply(lambda row: (row.name in peaks), axis=1)
input["neg_peaks"] = input.apply(lambda row: (row.name in neg_peaks), axis=1)

# Plotting
input.plot(ax=ax)
input[input["peaks"]].plot(ax=ax, linestyle="", marker="o")
input[input["neg_peaks"]].plot(ax=ax, linestyle="", marker="o")
plt.show()

其次,您可以使用find_peaks参数来调整检测灵敏度 * 或者 * 您可以仅对信号应用某些滤波方法,如savgol_filter或gaussian_filter

carvr3hs

carvr3hs2#

我相信这应该是你想要的大部分:

import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt

# Read the data from the CSV file
df = pd.read_csv('data.csv')

# Convert the first column to datetime format
df['Column1'] = pd.to_datetime(df['Column1'])

# Convert the second column to numeric type
df['Column2'] = df['Column2'].astype(int)

time_data = df['Time']

def normalize(arr):
    copy = np.array(arr, dtype=np.float64)
    copy -= copy.mean()
    copy /= copy.max()
    return copy

col1 = np.array(df['Column1']).astype(np.int64)
col2 = normalize(df['Column2'])
first_derivative = normalize(np.diff(col2)/np.diff(col1))
sigma = 7e-2
peaks = np.where((col2[:-1]>0) & (first_derivative < sigma) & (first_derivative > -sigma))
valleys = np.where((col2[:-1]<0) & (first_derivative < sigma) & (first_derivative > -sigma))

plt.figure(figsize=(16,9))
plt.plot(col1,normalize(df['Column2']),'k')
plt.plot(col1[:-1] , first_derivative)
plt.plot(col1[peaks], col2[peaks], 'r*')
plt.plot(col1[valleys], col2[valleys], 'g*')
plt.show()

还有一个步骤,我将在稍后添加,因为我现在没有时间做:您需要将所有峰和谷分组,以便在每个部分只有1个峰/谷。然后您可以计算持续时间、频率等指标。from that data数据.
在您提供的CSV上运行此代码的结果:

编辑:下面是这段代码的第二个版本:

col1 = np.array(df['Column1']).astype(np.int64)
col2 = normalize(df['Column2'])
# We can use second derivative to find peaks. If there is peak in second derivative,
# we have a valley in the original data and if there is a valley in second derivative,
# there is a peak in the original data. Since second derivative should be fairly clean,
# we can use percentiles to find peaks and valleys in second derivative. This approach
# is not ideal but should still be fairly good for normalazied data.
first_derivative = normalize(np.diff(col2)/np.diff(col1))
second_derivative = normalize(np.diff(first_derivative)/np.diff(col1[:-1]))

# I found 3 by trial and error, so it's not a perfect value
percentile_threshold=3
peak_percentile = np.percentile(second_derivative,percentile_threshold)
valley_percentile = np.percentile(second_derivative,100-percentile_threshold)
peaks = np.where(second_derivative<peak_percentile)
valleys = np.where(second_derivative>valley_percentile)

# Since there are still multiple peaks and vallyes, we can choose to only select the first
# or last detected peak/valley (here I choose the last one). "gap" determines the minimum gap
# required between two peaks/valleys. Here I choose a gap of 10 incides but you can enforce this
# via time or other metrics. The hstack is a quick and dirty hack to make sure the last
# peak/valley is not missed

gap = 10
peaks = np.hstack((peaks[0],peaks[0][-1]+gap+1))
peaks = np.array([peaks[i] for i in range(len(peaks)-1) if peaks[i+1]-peaks[i]>gap])
valleys = np.hstack((valleys[0],valleys[0][-1]+gap+1))
valleys = np.array([valleys[i] for i in range(len(valleys)-1) if valleys[i+1]-valleys[i]>gap])

plt.figure(figsize=(16,9))
plt.plot(col1,col2,'k')
plt.plot(col1[:-2] , second_derivative)
plt.plot(col1[peaks], col2[peaks], 'r*')
plt.plot(col1[valleys], col2[valleys], 'g*')
plt.plot(col1,[peak_percentile for _ in range(len(col1))], 'r')
plt.plot(col1,[valley_percentile for _ in range(len(col1))], 'g')
plt.show()

最终结果:

即使数据中存在向上或向下的趋势(但不是两者都有)(点向上或向下移动),此解决方案也有效。为了证明这一点,我可以添加一个线性趋势:

甚至是二次趋势:

如您所见,检测到的峰和谷的位置保持不变。
所有这些都表明,这是我在不到一个小时的时间内想出的一个黑客解决方案(更不用说我不擅长数学),我不会信任它或将其用于生产代码,但如果你只是在玩,它应该会起作用。

相关问题