我正在寻找一种方法来标记连续的常量值(例如n),这些值在pd. datafame(例如df)中连续为常量。
我已经写了一些代码,如果一个值与n/2个下一个和n/2个前一个数据点的差值为零,那么这个值将被标记。
n = 5 # the minimum number of sequential constant values
#to create a adatframe example
df=pd.DataFrame(np.random.randn(25), index=pd.date_range(start='2010-1-1',end='2010-1-2',freq='H'), columns=['value'])
#to modify the dataframe to have several sets of constant values
df[1:10]=23
df[20:26]=10
df[15:17]=15
for i in np.arange(1, int(n/2)):
# to calcualte the difference between value and ith previous values
df_diff['delta_' + str(i)] = (df['value'].diff(periods=i)).abs()
# to calcualte the difference between value and ith next values
df_diff['delta_' + str(-i)] = (df['value'].diff(periods=-i)).abs()
# to filter the results (e.g. as a boolean)
result_1 = (df_diff[:] <= 0).all(axis=1)
result_2 = (df_diff[:] <= 0).any(axis=1)
此示例中的result_1和results_2未提供正确答案。
我期待的是:
2010-01-01 00:00:00 False
2010-01-01 01:00:00 True
2010-01-01 02:00:00 True
2010-01-01 03:00:00 True
2010-01-01 04:00:00 True
2010-01-01 05:00:00 True
2010-01-01 06:00:00 True
2010-01-01 07:00:00 True
2010-01-01 08:00:00 True
2010-01-01 09:00:00 True
2010-01-01 10:00:00 False
2010-01-01 11:00:00 False
2010-01-01 12:00:00 False
2010-01-01 13:00:00 False
2010-01-01 14:00:00 False
2010-01-01 15:00:00 False
2010-01-01 16:00:00 False
2010-01-01 17:00:00 False
2010-01-01 18:00:00 False
2010-01-01 19:00:00 False
2010-01-01 20:00:00 True
2010-01-01 21:00:00 True
2010-01-01 22:00:00 True
2010-01-01 23:00:00 True
2010-01-02 00:00:00 True
2条答案
按热度按时间xmq68pz91#
IIUC,使用
DataFrame.groupby
,石斑鱼为Series.diff
、.ne(0)
,然后为.cumsum
:[出]
解释
我们分组所依据的系列将是具有相等值的连续组:
当你使用
transform
返回一个与原始DataFrame
形状相同的对象,并聚合为“size”时,你会得到:从这里开始,将
Series.ge
(>=
)与您的值n
进行简单比较voase2hg2#
参见https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html