在pandas数据框上使用groupby后如何检查多列的条件?

nuypyhwy  于 2023-08-01  发布在  其他
关注(0)|答案(2)|浏览(114)

我有以下DF:

  1. data = [
  2. ['a', 'one', 10],
  3. ['a', 'two', 10],
  4. ['b', 'one', 13],
  5. ['b', 'two', 100],
  6. ['c', 'two', 100],
  7. ['c', 'one', 100],
  8. ['d', 'one', 100],
  9. ['d', 'one', 10
  10. ]
  11. df = pd.DataFrame(data, columns=['key1', 'key2', 'key3'])

个字符
我需要按key1分组,

  • 搜索key2 == 'one'(对于单次出现结果= OK,对于0或>1 = NOK),
  • 另外检查key3 == 10100是否OK,其他任何都是NOK。

结果应该是OKNOK。我怎样才能获得类似以下内容的输出?

  1. key1 result
  2. a ok
  3. b nok
  4. c ok
  5. d nok


我发现了一个类似的问题,并回答了here,但没有额外检查key3。到目前为止,我所拥有的是以下内容:

  1. test = df.groupby('key1')['key2'].apply(lambda x:(x=='one').sum()).reset_index(name='result')
  2. test['result'].where(~(test['result'] > 1), other='nok', inplace=True)
  3. test['result'].replace([0, 1], ['nok', 'ok'], inplace = True)


当我运行这段代码时,它会给出以下输出:

  1. key1 result
  2. 0 a ok
  3. 1 b ok
  4. 2 c ok
  5. 3 d nok


如何添加key3的支票?

nle07wnf

nle07wnf1#

IIUC,您可以尝试:

  1. def fn(x):
  2. # filter "one"
  3. x = x[x['key2'] == 'one']
  4. # there are zero or >1, return "nok"
  5. if len(x) == 0 or len(x) > 1:
  6. return 'nok'
  7. # is the single `key3` 10 or 100? If yes, return "ok"
  8. if x['key3'].iat[0] in [10, 100]:
  9. return 'ok'
  10. # otherwise "nok"
  11. return 'nok'
  12. out = df.groupby('key1').apply(fn).reset_index(name="result")
  13. print(out)

字符串
图纸:

  1. key1 result
  2. 0 a ok
  3. 1 b nok
  4. 2 c ok
  5. 3 d nok

展开查看全部
gblwokeq

gblwokeq2#

单行使用:

  • groupby('key1')['key2'].value_counts().xs('one', level='key2') == 1)用于检查key2
  • groupby('key1')['key3'].unique()获取每个key1key3
  • isin([10, 100]).all(axis=1)检查key3上的条件
  1. pd.DataFrame({"result": ((df.groupby("key1")["key2"].value_counts().xs("one", level="key2") == 1) & (pd.DataFrame.from_dict(dict(zip(df.groupby("key1")["key3"].unique().index, df.groupby("key1")["key3"].unique().values)), orient="index").isin([10, 100, np.nan]).all(axis=1))).replace([True, False], ["ok", "nok"]),})

字符串
它比使用groupby().apply()慢2倍。参见下面的比较:

  1. def fast(): # from @Adrej Kesely (10035985)
  2. def fn(x):
  3. # filter 'one'
  4. x = x[x['key2'] == 'one']
  5. # there are zero or >1, return 'nok'
  6. if len(x) == 0 or len(x) > 1:
  7. return 'nok'
  8. # is the single `key3` 10 or 100? If yes, return 'ok'
  9. if x['key3'].iat[0] in [10, 100]:
  10. return 'ok'
  11. # otherwise 'nok'
  12. return 'nok'
  13. return df.groupby('key1').apply(fn).reset_index(name='result')
  14. %timeit fast()
  15. # 677 µs ± 13.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
  16. def slow():
  17. return pd.DataFrame({"result": ((df.groupby("key1")["key2"].value_counts().xs("one", level="key2") == 1) & (pd.DataFrame.from_dict(dict(zip(df.groupby("key1")["key3"].unique().index, df.groupby("key1")["key3"].unique().values)), orient="index").isin([10, 100, np.nan]).all(axis=1))).replace([True, False], ["ok", "nok"]),})
  18. %timeit slow()
  19. # 1.42 ms ± 36.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


所用方法的文件:

  • value_counts()计数出现次数
  • xs()获取 Dataframe 的横截面
  • unique()
  • isin()检查是否有key3值不在“ok”列表中
  • all(axis=1)按行检查真值
  • replace()"ok""nok"替换bool
展开查看全部

相关问题