pandas 如果条件满足python,则遍历行并写入新列

j8ag8udp  于 2023-02-02  发布在  Python
关注(0)|答案(3)|浏览(161)

我有两个单独的df帧要比较:
第一代

P53-Malat1
Neat1-Malat1
Gap1-Malat1

和f2:

intA,intB
P53-Malat1,Neat1-Malat1
Gap1-Malat1,Malat1-Pias3

我想迭代f2中每列的行,看看它是否在f1中,如果是,则打印该行+"found",如果否,则在单独的列中打印该行+"not_found"。
f2中的第二列也是如此。
我尝试过这种方法,但它不工作-我错过了什么吗?

with open("f1.txt","r") as f1:
    content = f1.read().splitlines()
    #print(content)

f2 = pd.read_csv("f2.csv")

f2["col1_search"] = f2.apply(lambda x: x["intA"]+"_found" if x in content else x["intA"]+"_not_found", axis=1)
f2["col2_search"] = f2.apply(lambda x: x["intB"]+"_found" if x in content else x["intB"]+"_not_found", axis=1)

因此,所需输出应为以下格式的f2:

col1_search,col2_search
P53-Malat1_found,Neat1-Malat1_found
Gap1-Malat1_found,Malat1-Pias3_not_found

谢谢你。

uplii1fm

uplii1fm1#

如果我理解正确的话,内容是一个列表而不是 Dataframe ,如果是这样的话,你可以使用.isin,它将返回TrueFalse,每行可以Map到你想要的任何后缀。

import pandas as pd
content = ['P53-Malat1','Neat1-Malat1','Gap1-Malat1']

f2 = pd.DataFrame({'intA': {0: 'P53-Malat1', 1: 'Gap1-Malat1'},
                   'intB': {0: 'Neat1-Malat1', 1: 'Malat1-Pias3'}})

f2['col1_search'] = f2.intA + f2.intA.isin(content).map({True:'_found',False:'_not_found'})
f2['col2_search'] = f2.intB + f2.intB.isin(content).map({True:'_found',False:'_not_found'})

产出

intA          intB        col1_search             col2_search
0   P53-Malat1  Neat1-Malat1   P53-Malat1_found      Neat1-Malat1_found
1  Gap1-Malat1  Malat1-Pias3  Gap1-Malat1_found  Malat1-Pias3_not_found

或者,如果您有许多列:

(f2 + f2.isin(content).replace({True:'_found',False:'_not_found'})).add_suffix('_search')

产出

intA_search             intB_search
0   P53-Malat1_found      Neat1-Malat1_found
1  Gap1-Malat1_found  Malat1-Pias3_not_found

可以将其合并回原始数据

pd.concat([f2,(f2 + f2.isin(content).replace({True:'_found',False:'_not_found'})).add_suffix('_search')], axis=1)

产出

intA          intB        intA_search             intB_search
0   P53-Malat1  Neat1-Malat1   P53-Malat1_found      Neat1-Malat1_found
1  Gap1-Malat1  Malat1-Pias3  Gap1-Malat1_found  Malat1-Pias3_not_found
qfe3c7zg

qfe3c7zg2#

这是一个如何使用np的示例。其中

data = {'Category' : ['First', 'Second', 'Third'], 
        'First_Numbers' : [10, 10, 10],
        'Second_Numbers' : [20, 20, 20],
        'Third_Numbers' : [9, 21, 15]
       } 
df = pd.DataFrame(data)
comp_column = np.where((df['Third_Numbers'] < df['Second_Numbers']) & (df['Third_Numbers'] > df['First_Numbers']), 'found', 'not found')
df['check'] = comp_column
df

我插入了一些示例数据,你应该可以用你自己的数据替换它们。现在我看到你想比较两个不同的df,所以我建议你合并它们,这样你就只处理一个df。这是合并/连接/concatingPandasdf的最好文档:https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

xdnvmnnf

xdnvmnnf3#

f2 = pd.read_csv("f2.csv")
def transform(path:str,x):
    with open(path,"r") as f1:
         content = f1.read().splitlines()
    if x in content:
        return f"{x}_found"
    return f"{x}_not_found"

f2["col1_search"] = f2['intA'].apply(lambda x:transform("f2.csv", x.intA),axis=1)
f2["col2_search"] = f2['intB'].apply(lambda x:transform("f2.csv", x.intB),axis=1)

相关问题