pandas 在数据框中搜索LOV列，并替换为使用其他数据框的代码

bkhjykvo 于 2023-03-28 发布在其他

关注(0)|答案(1)|浏览(129)

想象一下，我有一个肮脏的数据框架，里面有员工的ID和每个国家的合同相关信息。
此数据框的某些列是LOV列（取决于国家/地区，某些列仅是一个国家/地区的LOV，其他列是其中的部分或全部），某些LOV列是强制性的，而有些则不是（仅用于了解是否接受空白值）。
我们需要使用另一个Map Dataframe 进行检查：

如果所提供的值存在于Map Dataframe 中，
如果是，则用 Dataframe 上的对应代码替换所提供的值。

如果提供的值不在列表中，则在主 Dataframe 上创建一个名为“Errors”的新列，其中它表示出错的列的名称（如果多于1列，则可以将名称保存在该列的列表中）。
从这个dataframe：

ID  Country Contract      Type
1   CZ      Permanent     BOFF
1   ES      Fixed-term    .
2   CZ      Contractor    Front-Office
3   PT      Permanent   
4   PT      2022-01-01    Employee
4   PT      Fixed-term    Office
4   ES                    Employee
5   SK      Permanent     Employee

使用此Map：

Country   Field       Values          Code      Mandadory
CZ        Contract    Permanent       PE        Yes
CZ        Contract    Fixed-term      FX        Yes
CZ        Contract    Contractor      CT        Yes
ES        Contract    Permanent       PERMA     No
SK        Contract    Permanent       PER-01    Yes
SK        Contract    Fixed-term      FIX-01    Yes
ES        Type        Office          OFF       Yes
CZ        Type        Back-Office     BOFF      Yes 
CZ        Type        Front-Office    FOFF      Yes 
PT        Type        Employee        EMP       No
PT        Type        Front-Office    FRONT     No

会导致这个dataframe：

ID  Country Contract    Type       Errors
1   CZ      PE          BOFF       ['Type']
1   ES      Fixed-term  .          ['Contract','Type']
2   CZ      CT          FOFF    
3   PT      Permanent       
4   PT      2022-01-01  FRONT      ['Type']
4   PT      Fixed-term  Office     ['Type']
4   ES      Employee               ['Contract','Type']
5   SK      PER-01      Employee

非常感谢您的支持！

pandas

来源：https://stackoverflow.com/questions/75833826/search-for-lov-columns-in-dataframe-and-replace-with-codes-using-other-dataframe

1条答案

按热度按时间

jtoj6r0c1#

使用您提供的 Dataframe ：

import pandas as pd

df = pd.DataFrame(
    {
        "ID": [1, 1, 2, 3, 4, 4, 4, 5],
        "Country": ["CZ", "ES", "CZ", "PT", "PT", "PT", "ES", "SK"],
        "Contract": [
            "Permanent",
            "Fixed-term",
            "Contractor",
            "Permanent",
            "2022-01-01",
            "Fixed-term",
            "",
            "Permanent",
        ],
        "Type": [
            "BOFF",
            ".",
            "Front-Office",
            "",
            "Employee",
            "Office",
            "Employee",
            "Employee",
        ],
    }
)

other = pd.DataFrame(
    {
        "Country": ["CZ", "CZ", "CZ", "ES", "SK", "SK", "ES", "CZ", "CZ", "PT", "PT"],
        "Field": [
            "Contract",
            "Contract",
            "Contract",
            "Contract",
            "Contract",
            "Contract",
            "Type",
            "Type",
            "Type",
            "Type",
            "Type",
        ],
        "Values": [
            "Permanent",
            "Fixed-term",
            "Contractor",
            "Permanent",
            "Permanent",
            "Fixed-term",
            "Office",
            "Back-Office",
            "Front-Office",
            "Employee",
            "Front-Office",
        ],
        "Code": [
            "PE",
            "FX",
            "CT",
            "PERMA",
            "PER-01",
            "FIX-01",
            "OFF",
            "BOFF",
            "FOFF",
            "EMP",
            "FRONT",
        ],
        "Mandadory": [
            "Yes",
            "Yes",
            "Yes",
            "No",
            "Yes",
            "Yes",
            "Yes",
            "Yes",
            "Yes",
            "No",
            "No",
        ],
    }
)

下面是使用Pandas merge和apply执行此操作的一种方法：

# Merge dataframes on relevant columns with filter on "Contract"
new_df = pd.merge(
    left=df,
    right=other.loc[other["Field"] == "Contract", ["Country", "Values", "Code"]],
    how="left",
    left_on=["Country", "Contract"],
    right_on=["Country", "Values"],
)

# Update "Contract" column
new_df["Contract"] = new_df.apply(
    lambda x: x["Code"] if not pd.isna(x["Code"]) else x["Contract"], axis=1
)

# Add new "Errors" column
new_df["Errors"] = new_df.apply(
    lambda x: [] if not pd.isna(x["Code"]) else ["Contract"], axis=1
)
new_df = new_df.drop(columns=["Values", "Code"])

# Merge dataframes on relevant columns with filter on "Type"
new_df = pd.merge(
    left=new_df,
    right=other.loc[other["Field"] == "Type", ["Country", "Values", "Code"]],
    how="left",
    left_on=["Country", "Type"],
    right_on=["Country", "Values"],
)

# Update "Type" column
new_df["Type"] = new_df.apply(
    lambda x: x["Code"] if not pd.isna(x["Code"]) else x["Type"], axis=1
)

# Update "Errors" column
new_df["Errors"] = new_df.apply(
    lambda x: x["Errors"] if not pd.isna(x["Code"]) else x["Errors"] + ["Type"], axis=1
)
new_df = new_df.drop(columns=["Values", "Code"])

然后：

print(new_df)
# Output

   ID Country    Contract      Type            Errors
0   1      CZ          PE      BOFF            [Type]
1   1      ES  Fixed-term         .  [Contract, Type]
2   2      CZ          CT      FOFF                []
3   3      PT   Permanent            [Contract, Type]
4   4      PT  2022-01-01       EMP        [Contract]
5   4      PT  Fixed-term    Office  [Contract, Type]
6   4      ES              Employee  [Contract, Type]
7   5      SK      PER-01  Employee            [Type]

赞(0）回复(0）举报 2023-03-28

我来回答

pandas 在数据框中搜索LOV列，并替换为使用其他数据框的代码

1条答案

相关问题

热门标签

最新问答