python 根据B列中是否存在A列值来设置值

xa9qqrwz  于 2022-12-10  发布在  Python
关注(0)|答案(2)|浏览(144)

我有一个包含多个列的 Dataframe ,我正在设置一个lambda函数,根据一系列逻辑来设置另一个列。
Dataframe 如下所示:
| 供应商|所属部门|方案1|备选办法2|
| - -|- -|- -|- -|
| 阿克梅|阿克梅|星期一|星期二|
| 阿克梅|阿克梅|星期三||
| 阿克梅|人力资源|星期一|星期三|
| 阿克梅|人力资源|星期二|星期三|
| 阿克梅|人力资源|星期二|星期四|
等等。
我想添加一个列与文本保持某一行或不。

full_quote_df['Keep Row'] = full_quote_df.apply(
                lambda item:
                   'Keep Row' if item.Supplier == item.Department and pd.notna(item['Option 1']) else \ #If supplier and department match, and the have an option, keep
                   'Do Not Keep Row' if item.Supplier == item.Department and pd.isna(item['Option 1'])  else \ #If they leave it blank, exclude it
                   'Keep Row' if item.Supplier != item.Department and item['Option 2'].isin(list(item['Option 1'].values()))  else \ #If the supplier does not match the department, but the second option exists in the first option, keep row
                   'Do Not Keep Row' if item.Supplier != item.Department and item['Option 2'].notin(item['Option 1'])  else \#If the supplier does not match the department, but the second option does not exist in the first option, remove row
                   None,
                axis=1
            )

因此,我的最终 Dataframe 应该如下所示:
| 供应商|所属部门|方案1|备选办法2|保留行|
| - -|- -|- -|- -|- -|
| 阿克梅|阿克梅|星期一|星期二|保留|
| 阿克梅|阿克梅|星期三||保留|
| 阿克梅|人力资源|星期一|星期三|保留|
| 阿克梅|人力资源|星期二|星期三|保留|
| 阿克梅|人力资源|星期二|星期四|不保留|
使用上面的函数,它给了我一个错误'str'对象没有属性'isin'。我不想把选项2列变成一个列表,因为我需要相对于整个列来评估特定的行,但是我不知道下一步该怎么做。

wi3ka0sx

wi3ka0sx1#

可以将numpy.wherepandas.Series.isin一起使用:

m = (df["Option 2"].isna()) | (df["Option 2"].isin(df["Option 1"]))
    
df["Keep Row"] = np.where(m, "Keep", "Do Not Keep")
#输出:
print(df)
  Supplier Department   Option 1   Option 2     Keep Row
0     Acme       Acme     Monday    Tuesday         Keep
1     Acme       Acme  Wednesday        NaN         Keep
2     Acme         HR     Monday  Wednesday         Keep
3     Acme         HR    Tuesday  Wednesday         Keep
4     Acme         HR    Tuesday   Thursday  Do Not Keep
dtcbnfnu

dtcbnfnu2#

如果您转而考虑将列(系列)作为一个整体而不是单个元素进行操作,那么使用pandas会有更好的效果。
您可以构建保存中间结果的Series,这样就不会每次都重新计算它们,而且它还可以使逻辑更易于阅读。

supplier_dept_match = full_quote_df['Supplier'] == full_quote_df['Department']
opt1_is_na = full_quote_df['Option 1'].isna()

# Note: the following logic is a bit unclear from your question.
# This approach will look for the option 2 values in the entire option 1 column.
# If you want to look for a certain option 1 subset, then you should clarify how this should behave.
opt2_in_any_opt1 = full_quote_df['Option 2'].isin(full_quote_df['Option 1'])

# Then, you can build up your logic all at once with boolean expressions:
keep = (supplier_dept_match & ~opt1_is_na) | (~supplier_dept_match & opt2_in_any_opt1)
full_quote_df['Keep Row'] = keep.map({True: "Keep", False: "Do Not Keep"})

希望这能有所帮助!

相关问题