Scipy interp1d替代pandas DataFrame

5w9g7ksd  于 2023-11-19  发布在  其他
关注(0)|答案(2)|浏览(123)

Scipy文档清楚地表明interp1d正在被弃用,并且可能不会包含在未来的版本中。我已经在生产中实现了interp1d,但是随着我们升级,我正在寻找可能的改进。我的问题是,我不知道该往哪个方向去取代我的interp1d实现。我已经尝试了一些其他的插值算法Scipy库中,我甚至尝试了numpy.interp例程,但我在实现方面遇到了困难。我需要帮助解决这个特定插值问题的interp1d的适当替代品,以便我可以专注于解决实现问题。
这是我的框架:(下面列出的Python测试设置)pandas DataFrame中的两列。列标题是日期。垂直索引是从0到9的简单整数刻度。值是浮点数,两个现有列之间的值应该有很大的不同,以便轻松可视化插值相关性。现在,我在现有的两列之间插入一个空白列。这个新的列标题是一个与第一列中的日期明显接近的日期。插值需要对这个距离进行加权在这种情况下,我希望插值得到的值更接近第1列中的值,而不是第3列中的值。
我也愿意实现一个不依赖于任何第三方库的手动例程,因为这种插值涉及的基本性质。

Python测试平台

import numpy as np
import pandas as pd
import scipy.interpolate as interp

# create the data for the first column
col1_data = np.arange(10) * 1.45 

# create the data for the second column
col2_data = np.full(10, np.nan)

# create the data for the third column
col3_data = np.arange(10) * 0.23

# create the dataframe
df = pd.DataFrame({'col1': col1_data, 'col2': col2_data, 'col3': col3_data}, index=np.arange(10))

# set the column names to the specified dates
df.columns = ['2023-01-15', '2023-02-15', '2023-12-15']

#print(df)

# make a deep copy of df for later use with the replacement interpolation routine
df_numpy_copy = df.copy(deep=True)

# Gather the dates
left_date = pd.to_datetime(df.columns[0])
missing_date = pd.to_datetime(df.columns[1])
right_date = pd.to_datetime(df.columns[2])

print(f"left_date: {left_date}")
print(f"missing_date: {missing_date}")
print(f"right_date: {right_date}")

# calculate distances between dates
left_distance = (missing_date - left_date).days
right_distance = (right_date - missing_date).days
total_distance = left_distance + right_distance

# normalize the distances
left_distance_normalized = left_distance / total_distance
right_distance_normalized = right_distance / total_distance
print(f"left_distance_normalized: {left_distance_normalized}")
print(f"right_distance_normalized: {right_distance_normalized}")

#gather the values of the first column
left_col_values = df.iloc[:, 0].to_numpy()
print(f"left_col_values: {left_col_values}")

#gather the values of the third column
right_col_values = df.iloc[:, 2].to_numpy()
print(f"right_col_values: {right_col_values}")

#interpolate the missing values using the values from the left and right columns
interp_func = interp.interp1d([left_distance, total_distance], [left_col_values, right_col_values], axis=0)
missing_col_values = interp_func(left_distance + right_distance_normalized * left_distance)

# fill in the missing values
df.iloc[:, 1] = missing_col_values

# and finally, print the dataframe
print(df)

#  
#-------------------------------------------------------
#
#   enter code here for the replacement interpolation method
#
# ...
# missing_col_values_replacement_method = ...
# fill in the missing values
#df_numpy_copy.iloc[:, 1] = missing_col_values_replacement_method

# and finally, print the dataframe displaying replacement values of the new interp method
print(df_numpy_copy)

字符串

pdtvr36n

pdtvr36n1#

您可以编写自己的线性插值函数,并使用Pandas DataFrames的矢量化将其应用于所有行。

import numpy as np
import pandas as pd

col1_data = np.arange(10) * 1.45 
col2_data = np.full(10, np.nan)
col3_data = np.arange(10) * 0.23

df = pd.DataFrame({'col1': col1_data, 'col2': col2_data, 'col3': col3_data}, 
                  index=np.arange(10))

df.columns = ['2023-01-15', '2023-02-15', '2023-12-15']
left_date = pd.to_datetime(df.columns[0])
missing_date = pd.to_datetime(df.columns[1])
right_date = pd.to_datetime(df.columns[2])

def lerp(x, x1, y1, x2, y2):
    return y1*(x - x2)/(x1 - x2) + y2*(x - x1)/(x2 - x1)

missing_col_values = lerp(x=missing_date,
                          x1=left_date, 
                          y1=df.iloc[:,0], 
                          x2=right_date, 
                          y2=df.iloc[:,2])

df.iloc[:, 1] = missing_col_values
print(df)

字符串

3npbholx

3npbholx2#

interp1d是legacy,而不是deprecated。这意味着没有计划删除它。

相关问题