Pandas检查时间序列的连续性

uxh89sit 于 2023-02-11 发布在其他

关注(0)|答案(4)|浏览(331)

我有一个带有月度索引的DataFrame。我想检查时间索引在月度频率上是否连续，如果可能的话，检查它变得不连续的地方，例如，在索引中相邻的两个月之间存在某些“间隔月”。
示例：以下时间序列数据

1964-07-31    100.00
1964-08-31     98.81
1964-09-30    101.21
1964-11-30    101.42
1964-12-31    101.45
1965-03-31     91.49
1965-04-30     90.33
1965-05-31     85.23
1965-06-30     86.10
1965-08-31     84.26

1964年10月、1965年[1、2、7]月未命中。

pandas

来源：https://stackoverflow.com/questions/54039062/pandas-check-time-series-continuity

4条答案

按热度按时间

nwsw7zdq1#

使用asfreq by month添加缺失的日期时间，将其过滤为新的Series，如有必要，使用创建的月份列表按年分组：

s = s.asfreq('m')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0   1964-10-31
1   1965-01-31
2   1965-02-28
3   1965-07-31
Name: 0, dtype: datetime64[ns]

out = s1.dt.month.groupby(s1.dt.year).apply(list)
print (out)
0
1964         [10]
1965    [1, 2, 7]
Name: 0, dtype: object

- 设置**：

s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0, 
               pd.Timestamp('1964-08-31 00:00:00'): 98.81, 
               pd.Timestamp('1964-09-30 00:00:00'): 101.21, 
               pd.Timestamp('1964-11-30 00:00:00'): 101.42, 
               pd.Timestamp('1964-12-31 00:00:00'): 101.45,
               pd.Timestamp('1965-03-31 00:00:00'): 91.49, 
               pd.Timestamp('1965-04-30 00:00:00'): 90.33, 
               pd.Timestamp('1965-05-31 00:00:00'): 85.23, 
               pd.Timestamp('1965-06-30 00:00:00'): 86.1, 
               pd.Timestamp('1965-08-31 00:00:00'): 84.26})

print (s)
1964-07-31    100.00
1964-08-31     98.81
1964-09-30    101.21
1964-11-30    101.42
1964-12-31    101.45
1965-03-31     91.49
1965-04-30     90.33
1965-05-31     85.23
1965-06-30     86.10
1965-08-31     84.26
dtype: float64

编辑：
如果日期时间不总是月份的最后一天：

s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0, 
               pd.Timestamp('1964-08-31 00:00:00'): 98.81, 
               pd.Timestamp('1964-09-01 00:00:00'): 101.21, 
               pd.Timestamp('1964-11-02 00:00:00'): 101.42, 
               pd.Timestamp('1964-12-05 00:00:00'): 101.45,
               pd.Timestamp('1965-03-31 00:00:00'): 91.49, 
               pd.Timestamp('1965-04-30 00:00:00'): 90.33, 
               pd.Timestamp('1965-05-31 00:00:00'): 85.23, 
               pd.Timestamp('1965-06-30 00:00:00'): 86.1, 
               pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31    100.00
1964-08-31     98.81
1964-09-01    101.21
1964-11-02    101.42
1964-12-05    101.45
1965-03-31     91.49
1965-04-30     90.33
1965-05-31     85.23
1965-06-30     86.10
1965-08-31     84.26
dtype: float64

#convert all months to first day
s.index = s.index.to_period('m').to_timestamp()
#MS is start month frequency
s = s.asfreq('MS')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0   1964-10-01
1   1965-01-01
2   1965-02-01
3   1965-07-01
dtype: datetime64[ns]

赞(0）回复(0）举报 2023-02-11

ubof19bj2#

我经常通过计算每个指数值之间差距来实现这一点。

times_gaps = df.index - df.index.shift(1)

然后您可以绘制这些：

times_gaps.plot()

如果有缺口，你会很快看到缺口在哪里。如果没有缺口，你会看到一条笔直的水平线。
您还可以选择间隙时间，执行以下操作：

times_gaps[times_gaps> threshold]

赞(0）回复(0）举报 2023-02-11

kyxcudwk3#

假设 Dataframe 与您的输入相同（第一列是日期），您可以执行以下操作：

all = pd.Series(data=pd.date_range(start=df[0].min(), end=df[0].max(), freq='M'))
mask = all.isin(df[0].values)
print(all[~mask])

- 产出**

3    1964-10-31
6    1965-01-31
7    1965-02-28
12   1965-07-31
dtype: datetime64[ns]

其思想是创建一个从第一个日期到最后一个日期的按月频率的日期范围，然后根据第一列检查这些值。

赞(0）回复(0）举报 2023-02-11

nue99wik4#

import pandas as pd

# Create a sample time-series data
dates = pd.date_range('2022-01-01', periods=12, freq='M')
data = range(12)
df = pd.DataFrame({'date': dates, 'value': data})

# Check if the time-series is continuous for every month
df_monthly = df.set_index('date').resample('M').mean()
if df_monthly.isnull().sum().sum() == 0:
    print("The time-series is continuous for every hour.")
else:
    print("The time-series is NOT continuous for every hour.")

赞(0）回复(0）举报 2023-02-11

我来回答

Pandas检查时间序列的连续性

4条答案

相关问题

热门标签

最新问答