pandas 使用另一个panda Dataframe Map替换列值

pn9klfpd  于 2023-03-28  发布在  其他
关注(0)|答案(1)|浏览(109)

假设我有以下不同国家的员工合同的脏数据(df1):

ID  Country  Name     Job        Date         Grade
1   CZ       John     Office     2021-01-01   Senior
1   SK       John     .          2021-01-01   Assistant
2   AE       Peter    Carpinter  2000-05-03 
3   PE       Marcia   Cleaner    1989-11-11   ERROR!
3   FR       Marcia   Assistant  1978-01-05   High
3   FR       Marcia              1999-01-01   Senior

我需要查看LOVMap表,在该表中,每个国家/地区都有不同(或相同)的LOV列,这些列将替换代码提供的值。对于每个国家/地区,它将检查该列是否在该国家/地区的LOVMap中,如果值存在于“值”列中,则替换为相应的代码。如果不存在,则保留相同的值。
所以使用这个Map(df2):

Country Field   Values      Code
US      Job     Back        BA
US      Job     Front       FR
US      Job     Office      OFF
CZ      Job     Office      CZ_OFF
CZ      Job     Field       CZ_Fil
SK      Job     All         ALL
FR      Job     Assistant   AST
AE      Job     Carpinter   CAR
AE      Job     Carpinter   CAR
CZ      Grade   Senior      S
CZ      Grade   Junior      J
SK      Grade   M1          M1
FR      Grade   Low         L
FR      Grade   Mid         M1
FR      Grade   High        H

将导致以下 Dataframe :

ID  Country     Name    Job     Date        Grade
1   CZ          John    CZ_OFF  2021-01-01  S
1   SK          John    .       2021-01-01  M1
2   AE          Peter   CAR     2000-05-03  
3   PE          Marcia  Cleaner 1989-11-11  ERROR!
3   FR          Marcia  AST     1978-01-05  H
3   FR          Marcia          1999-01-01  Senior

非常感谢您的支持!

fykwrbwg

fykwrbwg1#

使用melt + merge + pivot + combine_first

cols = df1.columns.difference(df2['Field'].unique())

out = (
 df1.melt(cols, var_name='Field', value_name='Values', ignore_index=False)
    .reset_index()
    .merge(df2.drop_duplicates())
    .pivot(index=cols.union(['index']), columns='Field', values='Code')
    .reset_index(list(cols))
    .combine_first(df1)[df1.columns]
)

输出:

ID Country    Name      Job        Date      Grade
0   1      CZ    John   CZ_OFF  2021-01-01          S
1   1      SK    John        .  2021-01-01  Assistant
2   2      AE   Peter      CAR  2000-05-03        NaN
3   3      PE  Marcia  Cleaner  1989-11-11     ERROR!
4   3      FR  Marcia      AST  1978-01-05          H
5   3      FR  Marcia      NaN  1999-01-01     Senior

替代使用循环(在df1的地方修改)和groupby,以便于选择merge的行:

g = df2.groupby('Field')

for c in df1.columns.intersection(df2['Field'].unique()):
    df1[c] = df1.merge(g.get_group(c).drop(columns='Field')
                        .rename(columns={'Values': c}), how='left'
                       )['Code'].fillna(df1[c])

相关问题