在Pandas中,当按多列分组时,如何在聚合和过滤后检索创建每个组的行?

wgxvkvu9  于 2023-03-28  发布在  其他
关注(0)|答案(2)|浏览(98)

这是this question的后续

import pandas as pd

df = pd.DataFrame(
    {
        'a': ['A', 'A', 'B', 'B', 'B', 'C'],
        'b': ['A', 'A', 'B', 'B', 'B', 'C'],
        'hole': [True, True, True, False, False, True]
    }
)

print(df)

groups = df.groupby(['a', 'b'])  # "A", "B", "C"
agg_groups = groups.agg({'hole':lambda x: all(x)}) # "A": True, "B": False, "C": True

original_index_filtered = agg_groups.index[agg_groups['hole']]
original_filtered = df[df[['a', 'b']].isin(original_index_filtered)]
print(original_filtered)

现在输出

a  b   hole
0  A  A   True
1  A  A   True
2  B  B   True
3  B  B  False
4  B  B  False
5  C  C   True
     a    b  hole
0  NaN  NaN   NaN
1  NaN  NaN   NaN
2  NaN  NaN   NaN
3  NaN  NaN   NaN
4  NaN  NaN   NaN
5  NaN  NaN   NaN

似乎我没有这样做的权利时,有一个多指数参与。

czq61nw1

czq61nw11#

如果你想检查两个列是否在(n,2)矩阵中,你可以使用numpy广播来做到这一点。DataFrame.isin旨在检查DataFrame中的每个元素是否在(n,1)数组中。

m = df[['a', 'b']].to_numpy() == np.array([*original_index_filtered.to_numpy()])[:, None]
original_filtered = df[m.all(axis=-1).any(axis=0)]
print(m)

[[[ True  True]
  [ True  True]
  [False False]
  [False False]
  [False False]
  [False False]]

 [[False False]
  [False False]
  [False False]
  [False False]
  [False False]
  [ True  True]]]

print(original_filtered)

   a  b  hole
0  A  A  True
1  A  A  True
5  C  C  True
toiithl6

toiithl62#

您应该使用transform而不是agg

df[groups['hole'].transform('all')]

也就是说,您可以始终使用merge来对齐agg_groups并为boolean indexing创建掩码:

cols = list(agg_groups.index.names)
m = df[cols].merge(agg_groups, left_on=cols, right_index=True, how='left')['hole']
out = df[m]

输出:

a  b  hole
0  A  A  True
1  A  A  True
5  C  C  True

相关问题