pandas 为什么rolling.apply报告错误？我应该如何修改代码

gojuced7 于 2023-09-29 发布在其他

关注(0)|答案(1)|浏览(158)

我有一个这样的dataframe：

time_node   rate_node
  0   30          1.67
  1   60          1.82
  2   90          2.13
  3   180         2.53
  4   270         2.68
  5   360         2.71

我想用window来实现这样的计算：从第二行开始，第二行是索引等于1的行，（time_node[index] * rate_node[index] - time_node[index- 1] * rate_node[index - 1]）/（time[index] - time[index- 1]）。
例如，第二行应该这样计算：（60 * 1.82 - 30 * 1.67）/（60 - 30）;第三行：（90 * 2.13 - 60 * 1.82）/（90 - 60），并增加计算结果一栏。
代码如下：

nodes = [30, 60, 90, 180, 270, 360]
rate = [1.67000, 1.82000, 2.13000, 2.53000, 2.68000, 2.71000]

df = pd.DataFrame({'time_node': nodes, 'rate_node': rate})

def forward_rate(rate, time):
    denominator = rate[1] * time[1] - rate[0] * time[0]
    numerator = time[1] - time[0]
    return denominator/numertor
df.rolling(2).apply(forward_rate, {'rate': 'rate_node', 'time': 
'time_node'})

我想用rolling.apply来实现计算，但是它引发了一个ERROR，我已经阅读了很长时间的文档，但是我仍然不知道代码哪里错了。请帮帮我

pandas

来源：https://stackoverflow.com/questions/77163405/why-does-rolling-apply-report-an-error-how-should-i-modify-the-code

1条答案

按热度按时间

oo7oh9g91#

问题上下文解决方案

我猜你在找这样的东西：

df['...'] = (df['rate_node']*df['time_node']).diff() / df['time_node'].diff()

测试数据的输出如下：

0     NaN
1    1.97
2    2.75
3    2.93
4    2.98
5    2.80
dtype: float64

请注意，当使用apply方法聚合数据时，滚动窗口将沿着 * 每列分别 * 循环对。第二个位置变量{'rate': 'rate_node', 'time': 'time_node'}将被解释为raw参数，该参数应为布尔值。这就是为什么你看到的可能是ValueError: raw parameter must be True or False。
参见the description of Rolling.apply

一般情况

一般情况下的解决方案之一是使用外部for循环生成数据。举例来说：

name = 'new column'
df[name] = float('nan')
for index, window in zip(df.index, df.rolling(2)):
    if len(window) == 2:
        rate = window['rate_node'].values
        time = window['time_node'].values
        df.loc[index, name] = forward_rate(rate, time)

其中forward_rate是原始函数。请注意，我们需要传递一个numpy ndarray作为forwar_rate的参数，以便使其工作，因为否则传递数据的原始数据索引将与其本地位置索引混淆。
作为替代，我们可以尝试从numpy滑动窗口：

from numpy.lib.stride_tricks import sliding_window_view

data = sliding_window_view(df[['rate_node', 'time_node']], window_shape=2, axis=0)

rate = data[:,0,:].T
time = data[:,1,:].T

df.loc[df.index[1:], 'new_column'] = forward_rate(rate, time)

其中forward_rate是原始函数。

p.s.使用Numba引擎滚动窗口

如果安装了Numba，那么我们可以使用rolling和method='table'以及apply和engine='numba'，以便一次对所有列执行窗口操作。在这种情况下，聚合函数应该与2D数组一起工作，并返回可以沿着列广播的答案。最后，我们得到一个新的DataFrame，其结构与原始DataFrame相同。因此，如果函数返回一个数字，我们必须将答案限制在结果的任何列。举例来说：

df['something_new'] = (
    df[['rate_node','time_node']]     # get columns in the right order
    .rolling(2, method='table')       # pass method='table' to work with all columns at once
    .apply(lambda x: (x[1,1]*x[1,0]-x[0,1]*x[0,0])/(x[1,1]-x[0,1]),   
           raw=True, engine='numba')  # x is indexed as a numpy.ndarray when engine='numba'
).iloc[:, 0]                          # use only one of the returned columns

赞(0）回复(0）举报 2023-09-29

我来回答

pandas 为什么rolling.apply报告错误？我应该如何修改代码

1条答案

问题上下文解决方案

一般情况

p.s.使用Numba引擎滚动窗口

相关问题

热门标签

最新问答