pandas 如何将一个 Dataframe 的列值与查找表匹配,并设置带有偏移量的值?

bogh5gae  于 2023-03-06  发布在  其他
关注(0)|答案(1)|浏览(112)

我有以下 Dataframe :

# Creating Data_One_df
data_one = {
    "Date": ["12/1/2022"] * 7,
    "Time": ["6:30", "6:35", "6:40", "6:45", "6:50", "6:55", "7:00"],
    "Bars_left": [6, 5, 4, 3, 2, 1, 0],
    "6:30": [""] * 7,
    "6:35": [""] * 7,
    "6:40": [""] * 7,
    "6:45": [""] * 7,
    "6:50": [""] * 7,
    "6:55": [""] * 7,
    "7:00": [""] * 7,
}
Data_One_df = pd.DataFrame(data_one)

# Creating Data_Two_df
data_two = {
    "Time": ["6:30", "6:35", "6:40", "6:45", "6:50", "6:55", "7:00"],
    "Fcst": [1, 2, 1, 1, 2, 1, 1],
}
Data_Two_df = pd.DataFrame(data_two)

我想填充Data_One_df(输入表)的列,方法是将"Time"列的行值与Data_Two_df(查找表)中的"Time"列匹配,并将"Fcst"的累计和的偏移量(向下1行)返回到"Input table"的偏移量(向右1列)。
例如:对于Data_One_df中的行6:30,我想在Data_Two_df('lookup table')中找到对应的行(即6:30),然后偏移(即6:35)。接下来,计算从6:35开始的累计和,即8。将结果放入'input table'的列(6:35)中。然后,我想在所有列中重复此值。
以下是预期结果:

data_three = {
    "Date": ["12/1/2022"] * 7,
    "Time": ["6:30", "6:35", "6:40", "6:45", "6:50", "6:55", "7:00"],
    "Bars_left": [6, 5, 4, 3, 2, 1, 0],
    "6:30": ["NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN"],
    "6:35": [8, "NaN", "NaN", "NaN", "NaN", "NaN", "NaN"],
    "6:40": [8, 6, "NaN", "NaN", "NaN", "NaN", "NaN"],
    "6:45": [8, 6, 5, "NaN", "NaN", "NaN", "NaN"],
    "6:50": [8, 6, 5, 4, "NaN", "NaN", "NaN"],
    "6:55": [8, 6, 5, 4, 2, "NaN", "NaN"],
    "7:00": [8, 6, 5, 4, 2, 1, "NaN"],
}
Desired_df = pd.DataFrame(data_three)

我尝试了许多方法,但都无济于事。最新的代码如下:

# Get a list of time columns
time_cols = Data_One_df.columns[3:]

# Loop through the time columns and update the values
# in Data_One_df based on Data_Two_df
for i, col in enumerate(time_cols):
    # Get the corresponding forecast value from Data_Two_df
    fcst = Data_Two_df["Fcst"][i]
    # Update the values in Data_One_df based on the forecast
    Data_One_df[col] = [np.nan] * fcst + list(Data_One_df[col][fcst:])
1sbrub3j

1sbrub3j1#

下面是使用Pandas shiftcumsum执行此操作的一种方法:

# Add necessary infos to Data_Two_df
Data_Two_df["Left_Time"] = Data_Two_df["Time"].shift(-1)
Data_Two_df["Fcst_cumsum"] = Data_Two_df["Fcst"][::-1].cumsum()[::-1].shift(-1)
Data_Two_df = Data_Two_df.dropna()

# Add values to Data_One_df
Data_One_df = Data_One_df.replace("", pd.NA)
for t, left_t, val in zip(
    Data_Two_df["Time"], Data_Two_df["Left_Time"], Data_Two_df["Fcst_cumsum"]
):
    Data_One_df.loc[Data_One_df["Time"] == t, left_t] = val
Data_One_df.loc[:, "6:30":] = Data_One_df.loc[:, "6:30":].fillna(method="ffill", axis=1)

然后:

print(Data_One_df)
# Output
        Date  Time  Bars_left  6:30  6:35  6:40  6:45  6:50  6:55  7:00
0  12/1/2022  6:30          6  <NA>   8.0   8.0   8.0   8.0   8.0   8.0
1  12/1/2022  6:35          5  <NA>  <NA>   6.0   6.0   6.0   6.0   6.0
2  12/1/2022  6:40          4  <NA>  <NA>  <NA>   5.0   5.0   5.0   5.0
3  12/1/2022  6:45          3  <NA>  <NA>  <NA>  <NA>   4.0   4.0   4.0
4  12/1/2022  6:50          2  <NA>  <NA>  <NA>  <NA>  <NA>   2.0   2.0
5  12/1/2022  6:55          1  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   1.0
6  12/1/2022  7:00          0  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>

相关问题