pandas 如何将 Dataframe 的每一行插入到不同 Dataframe 中匹配的行之前?

y0u0uwnf  于 2023-03-16  发布在  其他
关注(0)|答案(1)|浏览(118)

我有一个小的和一个大的 Dataframe
最小那个

WS      period shortCode identifier
6        197.78  2023-03-10   TC2-FFA       spot
7        196.79  2023-03-10   TC5-FFA       spot
8        253.13  2023-03-10   TC6-FFA       spot
9        198.13  2023-03-13  TC12-FFA       spot
10       166.67  2023-03-10  TC14-FFA       spot
11       217.86  2023-03-10  TC17-FFA       spot
18        97.00  2023-03-10   TD3-FFA       spot
19       172.19  2023-03-10   TD7-FFA       spot
20       205.71  2023-03-13   TD8-FFA       spot
21       175.63  2023-03-10  TD19-FFA       spot
22       115.45  2023-03-10  TD20-FFA       spot
23  11350000.00  2023-03-10  TD22-FFA       spot
24       232.14  2023-03-10  TD25-FFA       spot

有多索引的大文件

datumUnit                      $/mt       WS
identifier period shortCode                 
TC2BALMO   Mar 23 TC2-FFA    39.376  228.930
TC2CURMON  Mar 23 TC2-FFA    35.946  208.988
TC2+1_M    Apr 23 TC2-FFA    38.444  223.512
TC2+2_M    May 23 TC2-FFA    37.786  219.686
TC2+3_M    Jun 23 TC2-FFA    36.613  212.866
                            ...      ...
TD25+3Q    Q4 23  TD25-FFA   42.909  185.432
TD25+4Q    Q1 24  TD25-FFA   39.000      NaN
TD25+5Q    Q2 24  TD25-FFA   32.421      NaN
TD25+1CAL  Cal 24 TD25-FFA   34.250      NaN
TD25+2CAL  Cal 25 TD25-FFA   33.955      NaN

这是它的多索引

MultiIndex([( 'TC2BALMO', 'Mar 23',  'TC2-FFA'),
            ('TC2CURMON', 'Mar 23',  'TC2-FFA'),
            (  'TC2+1_M', 'Apr 23',  'TC2-FFA'),
                  ...
            (  'TD25+4Q',  'Q1 24', 'TD25-FFA'),
            (  'TD25+5Q',  'Q2 24', 'TD25-FFA'),
            ('TD25+1CAL', 'Cal 24', 'TD25-FFA'),
            ('TD25+2CAL', 'Cal 25', 'TD25-FFA')],
           names=['identifier', 'period', 'shortCode'], length=198)

我希望将小 Dataframe 的“spot”行插入到每个shortCode的第二个 Dataframe 的顶部,而不更改大 Dataframe 的顺序
预期结果

datumUnit                      $/mt       WS
identifier period shortCode                 
spot     23-03-10 TC2-FFA      NaN   197.78        
TC2BALMO   Mar 23 TC2-FFA    39.376  228.930
TC2CURMON  Mar 23 TC2-FFA    35.946  208.988
TC2+1_M    Apr 23 TC2-FFA    38.444  223.512
TC2+2_M    May 23 TC2-FFA    37.786  219.686
TC2+3_M    Jun 23 TC2-FFA    36.613  212.866
                            ...      ...
spot     23-03-10 TD25-FFA      NaN   232.14  
TD25BALMO  Mar 23 TD25-FFA   48.902  211.331
TD25CURMON Mar 23 TD25-FFA   53.254  230.138
TD25+1_M   Apr 23 TD25-FFA   46.815  202.312
TD25+2_M   May 23 TD25-FFA   43.717  188.924
TD25+3_M   Jun 23 TD25-FFA   41.571  179.650
TD25+4_M   Jul 23 TD25-FFA   40.776  176.214
TD25+5_M   Aug 23 TD25-FFA   40.281  174.075
TD25CURQ   Q1 23  TD25-FFA   46.668  201.677
TD25+1Q    Q2 23  TD25-FFA   44.035  190.298
TD25+2Q    Q3 23  TD25-FFA   40.367  174.447
TD25+3Q    Q4 23  TD25-FFA   42.909  185.432
TD25+4Q    Q1 24  TD25-FFA   39.000      NaN
TD25+5Q    Q2 24  TD25-FFA   32.421      NaN
TD25+1CAL  Cal 24 TD25-FFA   34.250      NaN
TD25+2CAL  Cal 25 TD25-FFA   33.955      NaN
kwvwclae

kwvwclae1#

您可以重新设置日期和.merge的格式

new_rows = (
    spots.assign(
        period = pd.to_datetime(spots["period"]).dt.strftime("%b %y"),
        old_period = spots["period"],
    )
    .merge(
       df.reset_index() # reset MultiIndex
         .reset_index() # generate "index" column with row number
       [["index", "period", "shortCode"]])
    .drop_duplicates(subset=["period", "shortCode"])
    .set_index("index")
)

new_rows = (
   new_rows
   .drop(columns="period")
   .rename(columns={"old_period": "period"})
)

new_rows = new_rows[["identifier", "period", "WS", "shortCode"]]
>>> new_rows
      identifier      period      WS shortCode
index                                         
0           spot  2023-03-10  197.78   TC2-FFA

然后.concat.sort_index
一个二个一个一个

相关问题