在pandas/python中验证多列w.r.t多个列表值

ioekq8ef  于 2023-05-15  发布在  Python
关注(0)|答案(2)|浏览(149)

需要在pandas/python中使用3个不同的值列表验证3个不同的pandas列。问题是它覆盖了第一次确认的状态。

import pandas as pd
data = pd.DataFrame({'freq': ['Dly', 'ad','Weekly  ', 'XXX'],
                         'typ': ['TEST', 'A','TEST2', 'YYYY'],                                    
                        'category': ['ACC', 'T','TEST3', 'ZZZZ'],
                        'id': ['1', '2','3', '4']})

3 lists:
freq_list = ['ad','annex','Qtr','Weekly']
type_list= ['A','B']
catry_list = ['T','C','ACC']

Code:
df['status'] = df.apply(lambda x: x.typ in type_list, axis=1)
df['status'] = df.apply(lambda x: x.freq in freq_list, axis=1)
df['status'] = df.apply(lambda x: x.category in catry_list, axis=1)
Not able to detect this as it is overiding one status with other status w.r.t column.

Expected output:
freq    typ     category    id    status
Dly     TEST    ACC        1     freq and type does not exist  in list
ad       A       T         2     
Weekly  TEST2    TEST3     3     typ and category does not exist 
XXX       YYYY    ZZZZ     4     Freq, typ and cateogry does not exist
axr492tv

axr492tv1#

你可以试试这样的方法:

import pandas as pd
import numpy as np

data = pd.DataFrame(
    {
        "freq": ["Dly", "ad", "Weekly", "XXX"],
        "typ": ["TEST", "A", "TEST2", "YYYY"],
        "category": ["ACC", "T", "TEST3", "ZZZZ"],
        "id": ["1", "2", "3", "4"],
    }
)

# 3 lists:
freq_list = ["ad", "annex", "Qtr", "Weekly"]
type_list = ["A", "B"]
catry_list = ["T", "C", "ACC"]

dfm = pd.concat(
    [
        ~data[c].isin(l)
        for c, l in zip(data.columns, [freq_list, type_list, catry_list])
    ],
    axis=1,
)

data["status"] = (
    dfm.dot(data.columns[:3] + ", ").str.strip(", ").replace("", np.nan)
    + " not exists in list"
)

data

输出:

freq    typ category id                                  status
0     Dly   TEST      ACC  1            freq, typ not exists in list
1      ad      A        T  2                                     NaN
2  Weekly  TEST2    TEST3  3        typ, category not exists in list
3     XXX   YYYY     ZZZZ  4  freq, typ, category not exists in list
inn6fuwd

inn6fuwd2#

你可以定义一个函数如下:

def func(row):
    s = ""
    if row['typ'].strip() not in type_list:
        s += 'typ, '
    if row['freq'].strip() not in freq_list:
        s += 'freq, '
    if row['category'].strip() not in catry_list:
        s += 'category '
    if  s != "":
        s += 'not in lists'
    return s

df['status'] = df.apply(func, axis =1)

相关问题