使用Scipy的LowLevelCallable和numba cfunc优化一个基于时间的代码,该代码将多个数组作为输入

cqoc49vn  于 2023-01-26  发布在  其他
关注(0)|答案(1)|浏览(145)

bounty将在17小时后过期。回答此问题可获得+50声望奖励。Rafnus希望引起更多人对此问题的关注:我并不关心任何具体的集成方法,我只是想把下面给出的代码加速一个数量级。这应该是绝对可能的,我只是不知道怎么做。任何人谁管理的加速水平是有资格的赏金。

我想通过结合Numba的cfunc和Scipy的LowLevelCallable功能来优化基于时间的ODE系统的求解,以大大提高odeint()的速度。但是,我很难找到正确解的确切语法。
我的(未优化的)代码看起来像这样:

import numpy as np
import numba as nb
from numba import types
from matplotlib import pyplot as plt
import scipy.integrate as si

# hours in the simulation
T_sim = 72 
# simulate fluctuating outside temp for 3 days
T_out = np.sin(np.linspace(0, T_sim, T_sim)/3.8) * 8 + 18 
# heat capacity of the air and floor
cap_air, cap_flr = 1, 10

def func(T, t):
    T_air, T_flr= T[0], T[1]
    t_ = int(t)
    H_airOut = 4 * U_vent[t_] * (T_air - T_out[t_])
    H_blowAir = U_blow[t_]
    H_airFlr = (T_air - T_flr)
    dT_air = ((H_blowAir 
               - H_airOut
               - H_airFlr) / cap_air)
    dT_flr = H_airFlr / cap_flr
    return dT_air, dT_flr

T0 = [16, 18] # starting temperature for T_air and T_flr
t_eval = np.linspace(0, T_sim-1, T_sim-1)
U_blow = np.full(T_sim, 0.1) # control for the heating
U_vent = np.full(T_sim, 0.3) # control for the ventilation
result = si.odeint(func,T0,t_eval)

# this piece of code is a standin for the real code, where func() is run 
# tens of thousands of times. All variables are constant, except for the 
# two arrays U_blow and U_vent
for _ in range(10):
    U_blow = np.random.rand(T_sim)
    U_vent = np.random.rand(T_sim)
    result = si.odeint(func,T0,t_eval)

plt.plot(result[:, 0], label='T_air')
plt.plot(result[:, 1], label='T_flr')
plt.plot(T_out[:-1], label='T_out')
plt.legend()

根据其他SO帖子,我非常确定解决方案应该看起来像下面的代码,但我自己不能让它工作。如果只有一些变量在运行之间改变,问题会容易得多,但这不是事实,因为我需要能够在运行之间改变数组U_blowU_vent

import scipy as sp
from numba.types import float64, CPointer, intc

def jit_integrand_function(integrand_function):
    jitted_function = nb.jit(integrand_function, nopython=True)

    @nb.cfunc(float64(intc, CPointer(float64)))
    def wrapped(n, xx):
        return jitted_function(xx[0], xx[1])
    return sp.LowLevelCallable(wrapped.ctypes)

@jit_integrand_function
def func(T, t):
    T_air, T_flr= T[0], T[1]
    t_ = int(t)
    H_airOut = 4 * U_vent[t_] * (T_air - T_out[t_])
    H_blowAir = U_blow[t_]
    H_airFlr = (T_air - T_flr)
    dT_air = ((H_blowAir 
               - H_airOut
               - H_airFlr) / cap_air)
    dT_flr = H_airFlr / cap_flr
    return dT_air, dT_flr

T0 = [16, 18] # starting temperature for T_air and T_flr
t_eval = np.linspace(0, T_sim-1, T_sim-1)

for _ in range(10):
    U_blow = np.random.rand(T_sim)
    U_vent = np.random.rand(T_sim)
    result = si.odeint(func,T0,t_eval, (U_blow, U_vent))

plt.plot(result[:, 0], label='T_air')
plt.plot(result[:, 1], label='T_flr')
plt.plot(T_out[:-1], label='T_out')
plt.legend()

顺便说一句,我的完整模型在数值上不够稳定,欧拉前向积分法无法工作,所以这不是一个选择。

iyfamqjs

iyfamqjs1#

我试图回答如下,工作的这个完整的例子:

import scipy as sp
import scipy.integrate as si
import numpy as np

# hours in the simulation
T_sim = 72
# simulate fluctuating outside temp for 3 days
T_out = np.sin(np.linspace(0, T_sim, T_sim) / 3.8) * 8 + 18
# heat capacity of the air and floor
cap_air, cap_flr = 1, 10

U_blow = np.random.rand(T_sim)
U_vent = np.random.rand(T_sim)

def example_rhs(T, t, U_blow, U_vent):
    T_air, T_flr = T[0], T[1]
    t_ = int(t)
    H_airOut = 4 * U_vent[t_] * (T_air - T_out[t_])
    H_blowAir = U_blow[t_]
    H_airFlr = (T_air - T_flr)
    dT_air = ((H_blowAir - H_airOut - H_airFlr) / cap_air)
    dT_flr = H_airFlr / cap_flr

    return dT_air, dT_flr

# starting temperature for T_air and T_flr
T0 = [16, 18]
t_eval = np.linspace(0,
                     T_sim - 1,
                     T_sim - 1)

# Repeat the integration a couple of times to increase accuracy
for i in range(100):
    si.odeint(example_rhs, T0, t_eval, args=(U_blow, U_vent))

要分析任何python代码的性能,我将通过python -mcProfile -s cumtime test.py使用cProfile
这给了我以下内容(节略):

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
[...]
      100    0.001    0.000    2.021    0.020 _odepack_py.py:28(odeint)
      100    0.843    0.008    2.017    0.020 {built-in method scipy.integrate._odepack.odeint}

[...]
   973000    1.174    0.000    1.174    0.000 test.py:16(example_rhs)

正如我们所看到的,代码在odeint函数中花费了大约2秒,这是一个built-in函数,它调用Fortran代码来进行求解。在这段时间内,基于Python的rhs被调用了973000次,即每次积分9730次,尽管这对于不同的输入可能会有所不同。然而,只有大约一半~1.174秒花费在右边。2所以,即使你的回调优化到运行时间为零,你仍然需要~0.8秒来完成整个操作。3因为这个时间是花费在Fortran代码中的,我不认为有什么方法可以加快这部分的速度,所以,我不认为一个数量级是可能的,至少对于这个特定的示例来说是这样(但也许你的原始示例的行为不同?)
遗憾的是LowLevelCallable s显然不支持在scipy中集成ODE,这似乎是一个明显的补充(尽管可能有什么东西阻止了这一点?!)

相关问题