python 有没有Pandas聚合函数结合了“any”和“unique”的特性？

mrphzbgm 于 2023-01-12 发布在 Python

关注(0)|答案(4)|浏览(124)

我有一个包含类似数据的大型数据集：

>>> df = pd.DataFrame(
...     {'A': ['one', 'two', 'two', 'one', 'one', 'three'],
...      'B': ['a', 'b', 'c', 'a', 'a', np.nan]})          
>>> df
       A    B
0    one    a
1    two    b
2    two    c
3    one    a
4    one    a
5  three  NaN

有两个聚合函数“any”和“unique”：

>>> df.groupby('A')['B'].any()
A
one       True
three    False
two       True
Name: B, dtype: bool

>>> df.groupby('A')['B'].unique()
A
one         [a]
three     [nan]
two      [b, c]
Name: B, dtype: object

但我想得到下面的结果（或接近它的结果）：

A
one           a
three     False
two        True

我可以用一些复杂的代码来做，但最好是在python包中找到合适的函数或最简单的方法来解决问题。如果你能帮助我，我将不胜感激。

python

来源：https://stackoverflow.com/questions/75081356/is-there-pandas-aggregate-function-that-combines-features-of-any-and-unique

4条答案

按热度按时间

xj3cbfub1#

您可以聚合第一列的Series.nunique和唯一值，并删除其他列的可能缺失值：

df1 = df.groupby('A').agg(count=('B','nunique'), 
                          uniq_without_NaNs = ('B', lambda x: x.dropna().unique()))
print (df1)
       count uniq_without_NaNs
A                             
one        1               [a]
three      0                []
two        2            [b, c]

然后，如果列count大于1，则创建掩码，如果count等于1，则用uniq_without_NaNs替换值：

out = df1['count'].gt(1).mask(df1['count'].eq(1), df1['uniq_without_NaNs'].str[0])
print (out)
A
one          a
three    False
two       True
Name: count, dtype: object

赞(0）回复(0）举报 2023-01-12

hfsqlsce2#

>>> g = df.groupby("A")["B"].agg
>>> nun = g("nunique")
>>> pd.Series(np.select([nun > 1, nun == 1],
                        [True, g("unique").str[0]],
                        default=False),
              index=nun.index)

A
one          a
three    False
two       True
dtype: object

控制住群组聚合器
计算唯一的数目
如果〉1，即多于1个唯一，则置True
如果== 1，即只有1个唯一值，则将该唯一值
否则，即没有唯一的（完整的NaN），则设置为False

赞(0）回复(0）举报 2023-01-12

gab6jxml3#

您可以将groupby与agg组合，并使用布尔掩码来选择正确的输出：

# Your code
agg = df.groupby('A')['B'].agg(['any', 'unique'])

# Boolean mask to choose between 'any' and 'unique' column
m = agg['unique'].str.len().eq(1) & agg['unique'].str[0].notna()

# Final output
out = agg['any'].mask(m, other=agg['unique'].str[0])

输出：

>>> out
A
one          a
three    False
two       True

>>> agg
         any  unique
A                   
one     True     [a]
three  False   [nan]
two     True  [b, c]

>>> m
A
one       True  # choose 'unique' column
three    False  # choose 'any' column
two      False  # choose 'any' column

赞(0）回复(0）举报 2023-01-12

ehxuflar4#

new_df = df.groupby('A')['B'].apply(lambda x: x.notna().any())
new_df = new_df .reset_index()
new_df .columns = ['A', 'B']

这将为您提供：

A      B
0    one   True
1  three  False
2    two   True

现在如果我们想找到这些值，我们可以做：

df.groupby('A')['B'].apply(lambda x: x[x.notna()].unique()[0] if x.notna().any() else np.nan)

其给出：

A
one        a
three    NaN
two        b

赞(0）回复(0）举报 2023-01-12

我来回答

python 有没有Pandas聚合函数结合了“any”和“unique”的特性？

4条答案

相关问题

热门标签

最新问答