在pandas数据框上使用groupby后如何检查多列的条件？

nuypyhwy 于 2023-08-01 发布在其他

关注(0)|答案(2)|浏览(114)

我有以下DF：

data = [
    ['a', 'one', 10],
    ['a', 'two', 10],
    ['b', 'one', 13],
    ['b', 'two', 100],
    ['c', 'two', 100],
    ['c', 'one', 100],
    ['d', 'one', 100],
    ['d', 'one', 10
]
df = pd.DataFrame(data, columns=['key1', 'key2', 'key3'])

个字符
我需要按key1分组，

搜索key2 == 'one'（对于单次出现结果= OK，对于0或>1 = NOK），
另外检查key3 == 10或100是否OK，其他任何都是NOK。

结果应该是OK或NOK。我怎样才能获得类似以下内容的输出？

key1  result
a   ok
b   nok
c   ok
d   nok

型
我发现了一个类似的问题，并回答了here，但没有额外检查key3。到目前为止，我所拥有的是以下内容：

test = df.groupby('key1')['key2'].apply(lambda x:(x=='one').sum()).reset_index(name='result')
test['result'].where(~(test['result'] > 1), other='nok', inplace=True)
test['result'].replace([0, 1], ['nok', 'ok'], inplace = True)

型
当我运行这段代码时，它会给出以下输出：

key1    result
0   a   ok
1   b   ok
2   c   ok
3   d   nok

型
如何添加key3的支票？

pandas

来源：https://stackoverflow.com/questions/76797105/how-to-check-condition-on-multiple-columns-after-using-groupby-on-pandas-data-fr

2条答案

按热度按时间

nle07wnf1#

IIUC，您可以尝试：

def fn(x):
    # filter "one"
    x = x[x['key2'] == 'one']
    # there are zero or >1, return "nok"
    if len(x) == 0 or len(x) > 1:
        return 'nok'
    # is the single `key3` 10 or 100? If yes, return "ok"
    if x['key3'].iat[0] in [10, 100]:
        return 'ok'
    # otherwise "nok"
    return 'nok'
out = df.groupby('key1').apply(fn).reset_index(name="result")
print(out)

字符串
图纸：

key1 result
0    a     ok
1    b    nok
2    c     ok
3    d    nok

型

展开查看全部

赞(0）回复(0）举报 2023-08-01

gblwokeq2#

单行使用：

groupby('key1')['key2'].value_counts().xs('one', level='key2') == 1)用于检查key2
groupby('key1')['key3'].unique()获取每个key1的key3值
isin([10, 100]).all(axis=1)检查key3上的条件

pd.DataFrame({"result": ((df.groupby("key1")["key2"].value_counts().xs("one", level="key2") == 1) & (pd.DataFrame.from_dict(dict(zip(df.groupby("key1")["key3"].unique().index, df.groupby("key1")["key3"].unique().values)), orient="index").isin([10, 100, np.nan]).all(axis=1))).replace([True, False], ["ok", "nok"]),})

字符串
它比使用groupby().apply()慢2倍。参见下面的比较：

def fast(): # from @Adrej Kesely (10035985)
    def fn(x):
        # filter 'one'
        x = x[x['key2'] == 'one']
        # there are zero or >1, return 'nok'
        if len(x) == 0 or len(x) > 1:
            return 'nok'
        # is the single `key3` 10 or 100? If yes, return 'ok'
        if x['key3'].iat[0] in [10, 100]:
            return 'ok'
        # otherwise 'nok'
        return 'nok'
    return df.groupby('key1').apply(fn).reset_index(name='result')
%timeit fast()
# 677 µs ± 13.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
def slow():
    return pd.DataFrame({"result": ((df.groupby("key1")["key2"].value_counts().xs("one", level="key2") == 1) & (pd.DataFrame.from_dict(dict(zip(df.groupby("key1")["key3"].unique().index, df.groupby("key1")["key3"].unique().values)), orient="index").isin([10, 100, np.nan]).all(axis=1))).replace([True, False], ["ok", "nok"]),})
%timeit slow()
# 1.42 ms ± 36.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

型
所用方法的文件：

value_counts()计数出现次数
xs()获取 Dataframe 的横截面
unique()个
isin()检查是否有key3值不在“ok”列表中
all(axis=1)按行检查真值
replace()用"ok"和"nok"替换bool

展开查看全部

赞(0）回复(0）举报 2023-08-01

我来回答

在pandas数据框上使用groupby后如何检查多列的条件？

2条答案

相关问题

热门标签

最新问答