pandas 如何将时间序列按不同模式进行聚类分割？

nwo49xxi 于 2023-02-02 发布在其他

关注(0)|答案(2)|浏览(147)

这是一个包含许多 Dataframe 的较大数据的示例，类似于下面的示例（df_final）：

df1 = pd.DataFrame({"DEPTH (m)":np.arange(0, 2000, 2),
                    "SIGNAL":np.random.uniform(low=-6, high=10, size=(1000,))})        

df2 = pd.DataFrame({"DEPTH (m)":np.arange(2000, 3000, 2),
                    "SIGNAL":np.random.uniform(low=0, high=5, size=(500,))}) 

for i, row in df2.iterrows():
    df2.loc[i, "SIGNAL"] = row["SIGNAL"] * (i / 100)

df_final = pd.concat([df1, df2])

您可以看到此信号有两种模式（一种是“恒定”模式，另一种是递增模式）：

plt.figure()
plt.plot(df_final["SIGNAL"], df_final["DEPTH (m)"], linewidth=0.5)

plt.ylim(df_final["DEPTH (m)"].max(), df_final["DEPTH (m)"].min())

plt.xlabel("SIGNAL")
plt.ylabel("DEPTH")

有没有办法自动创建一个标记/聚类来分割这个信号？在这个例子中，我会在深度2000之前创建一个聚类，在深度2000之后创建另一个聚类。
另一个问题是，在我的项目中，我将有其他 Dataframe 具有两个以上的信号模式，不能为每个 Dataframe 手动设置，因为有很多。

pandas

来源：https://stackoverflow.com/questions/75299245/how-to-split-time-series-in-clusters-by-different-patterns

2条答案

按热度按时间

km0tfn4u1#

使用滚动标准差的一种可能性是：

s1 = df_final.loc[::-1, 'SIGNAL'].rolling(20).std()[::-1]
s2 = s1.diff()

N = 2 # number of groups
m = s2.lt(s2.quantile((N-1)/len(df_final)))

groups = (m&~m.shift(fill_value=False)).cumsum()

f, (ax, ax1, ax2) = plt.subplots(ncols=3, sharey=True)

for k, g in df_final.groupby(groups):
    g.plot(x='SIGNAL', y='DEPTH (m)', ax=ax, lw=0.5, label=f'group {k+1}')

ax1.plot(s1, df_final['DEPTH (m)'])
ax2.plot(s2, df_final['DEPTH (m)'])
    
ax.invert_yaxis()

ax.set_title('data')
ax1.set_title('rolling std')
ax2.set_title('diff')

输出：

赞(0）回复(0）举报 2023-02-02

1szpjjfi2#

在这种情况下，要将时间序列聚类到不同的模式中，您可以使用KMeans或DBSCAN等聚类算法。下面是如何使用KMeans执行此操作的示例：

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Scale the data
scaler = StandardScaler()
df_final_scaled = scaler.fit_transform(df_final[["SIGNAL"]])

# Fit KMeans
kmeans = KMeans(n_clusters=2)
kmeans.fit(df_final_scaled)

# Predict the clusters
df_final_clusters = kmeans.predict(df_final_scaled)

# Visualize the results
plt.figure()
plt.scatter(df_final["SIGNAL"], df_final["DEPTH (m)"], c=df_final_clusters, cmap="viridis")
plt.xlabel("SIGNAL")
plt.ylabel("DEPTH")
plt.show()

赞(0）回复(0）举报 2023-02-02

我来回答

pandas 如何将时间序列按不同模式进行聚类分割？

2条答案

相关问题

热门标签

最新问答