这一点:
periods = 5 * 3
df1 = pandas.DataFrame(dict(
v1=numpy.arange(2, 2 + periods) * 2,
v2=numpy.arange(3, 3 +periods) * 3),
index=pandas.date_range('2023-01-01', periods=periods, freq='8H'))
print(df1)
periods = 3
df2 = pandas.DataFrame(dict(
v3=numpy.arange(4, 4 + periods) * 4,
v4=numpy.arange(5, 5 + periods) * 5),
index=pandas.date_range('2023-01-02', periods=periods, freq='2D'))
print(df2)
df1.loc[df1.index.date, ['v3', 'v4']] = df2
print(df1)
结果:
v1 v2
2023-01-01 00:00:00 4 9
2023-01-01 08:00:00 6 12
2023-01-01 16:00:00 8 15
2023-01-02 00:00:00 10 18
2023-01-02 08:00:00 12 21
2023-01-02 16:00:00 14 24
2023-01-03 00:00:00 16 27
2023-01-03 08:00:00 18 30
2023-01-03 16:00:00 20 33
2023-01-04 00:00:00 22 36
2023-01-04 08:00:00 24 39
2023-01-04 16:00:00 26 42
2023-01-05 00:00:00 28 45
2023-01-05 08:00:00 30 48
2023-01-05 16:00:00 32 51
v3 v4
2023-01-02 16 25
2023-01-04 20 30
2023-01-06 24 35
v1 v2 v3 v4
2023-01-01 00:00:00 4 9 NaN NaN
2023-01-01 08:00:00 6 12 NaN NaN
2023-01-01 16:00:00 8 15 NaN NaN
2023-01-02 00:00:00 10 18 16.0 25.0
2023-01-02 08:00:00 12 21 16.0 25.0
2023-01-02 16:00:00 14 24 16.0 25.0
2023-01-03 00:00:00 16 27 NaN NaN
2023-01-03 08:00:00 18 30 NaN NaN
2023-01-03 16:00:00 20 33 NaN NaN
2023-01-04 00:00:00 22 36 20.0 30.0
2023-01-04 08:00:00 24 39 20.0 30.0
2023-01-04 16:00:00 26 42 20.0 30.0
2023-01-05 00:00:00 28 45 NaN NaN
2023-01-05 08:00:00 30 48 NaN NaN
2023-01-05 16:00:00 32 51 NaN NaN
其中每当df1
的日期与df2
的日期匹配时(即忽略时间分量),将来自df2
的每个值复制到df1
。
然而,改变df1
索引以具有时间分量(在该示例中为01:00
),即:
periods = 5 * 3
df1 = pandas.DataFrame(dict(
v1=numpy.arange(2, 2 + periods) * 2,
v2=numpy.arange(3, 3 +periods) * 3),
index=pandas.date_range('2023-01-01 01:00', periods=periods, freq='8H'))
print(df1)
periods = 3
df2 = pandas.DataFrame(dict(
v3=numpy.arange(4, 4 + periods) * 4,
v4=numpy.arange(5, 5 + periods) * 5),
index=pandas.date_range('2023-01-02', periods=periods, freq='2D'))
print(df2)
df1.loc[df1.index.date, ['v3', 'v4']] = df2
print(df1)
结果:
v1 v2
2023-01-01 01:00:00 4 9
2023-01-01 09:00:00 6 12
2023-01-01 17:00:00 8 15
2023-01-02 01:00:00 10 18
2023-01-02 09:00:00 12 21
2023-01-02 17:00:00 14 24
2023-01-03 01:00:00 16 27
2023-01-03 09:00:00 18 30
2023-01-03 17:00:00 20 33
2023-01-04 01:00:00 22 36
2023-01-04 09:00:00 24 39
2023-01-04 17:00:00 26 42
2023-01-05 01:00:00 28 45
2023-01-05 09:00:00 30 48
2023-01-05 17:00:00 32 51
v3 v4
2023-01-02 16 25
2023-01-04 20 30
2023-01-06 24 35
...
KeyError: "None of [Index([2023-01-01, 2023-01-01, 2023-01-01, 2023-01-02, 2023-01-02, 2023-01-02,\n 2023-01-03, 2023-01-03, 2023-01-03, 2023-01-04, 2023-01-04, 2023-01-04,\n 2023-01-05, 2023-01-05, 2023-01-05],\n dtype='object')] are in the [index]"
很显然:
df1.loc[df1.index.date, ['v3', 'v4']] = df2
不是基于日期设置值(即忽略时间)的适当方式。
问题:
- 为什么有时间成分的时候就不起作用了?
- 既然它不能与时间组件一起工作,为什么它能在没有时间组件的情况下通过匹配 * all * times(即不仅仅是
00:00
)来工作呢? - 什么是正确的方法来完成我的目标?
1条答案
按热度按时间2wnc66cl1#
在第一个示例中,您很幸运地有一些日期时间为
00:00:00
,因此可以使用.loc
;在第二个示例中,您没有这种日期时间,因此无法选择任何行。正确的方法可能是在
normalize
索引之后使用merge
:但是为什么选择了非00:00时间的行?
它们不是,您重新索引 Dataframe ,并使用
00:00
时间复制行: