当使用pandas对数据框进行分组时，应用带参数的多个函数

yjghlzjz 于 2023-03-28 发布在其他

关注(0)|答案(1)|浏览(163)

我尝试使用count_values函数在 Dataframe 上执行group_by。我希望第一列的count包含NA值，第二列的count包含标准化的count。我通过创建2个分组DF并连接它们来实现这一点。然而，这种方法似乎不是很干净，可能有一种更Python的方法来做到这一点。
dataframe：

store   item
0   Store1  table
1   Store2  chair
2   Store3  chair
3   Store2  table
4   Store2  chair
5   Store3  table
6   Store1  chair
7   Store1  table
8   Store3  None

工作代码：

df1 = pd.DataFrame(df.groupby('store')['item'].apply(pd.value_counts,normalize=True,dropna=False)).rename(columns={"item":"normcount"})
df2 = pd.DataFrame(df.groupby('store')['item'].apply(pd.value_counts,dropna=False)).rename(columns={"item":"count"})
df3 = pd.concat([df1,df2],axis=1)
print(df3)

                normcount   count
store           
Store1  table   0.666667    2
        chair   0.333333    1
Store2  chair   0.666667    2
        table   0.333333    1
Store3  chair   0.333333    1
        table   0.333333    1
         NaN    0.333333    1

看起来更像Python的非工作代码：

df.groupby(['store','item']).agg(count=('item',pd.value_counts),normcount=('item',pd.value_counts(normalize=True,dropna=False)))

df.groupby(['store','item']).agg(count=('item',pd.value_counts),normcount=('item',pd.Series.apply(pd.value_counts,normalize=True,dropna=False)))

df.groupby(['store','item']).agg(count=('item','value_counts'),normcount=('item','value_counts'))

我已经尝试了上述前2个代码片段的许多变体，但都没有成功。我要么得到语法错误，要么得到类型错误。最后一个可以工作，但我不能传递任何参数，因此我没有得到我想要的结果。
有人知道怎么用吗？
谢谢大家！

pandas

来源：https://stackoverflow.com/questions/75823885/apply-multiple-function-with-args-when-grouping-dataframe-with-pandas

1条答案

按热度按时间

mzaanser1#

我会直接使用value_counts和concat：

s = df.value_counts()

out = pd.concat([
           s.rename('count'),
           s.div(s.groupby(level='store').transform('sum')).rename('normcount')],
          axis=1).sort_index()

输出：

count  normcount
store  item                   
Store1 chair      1   0.333333
       table      2   0.666667
Store2 chair      2   0.666667
       table      1   0.333333
Store3 None       1   0.333333
       chair      1   0.333333
       table      1   0.333333

或者：

out = pd.DataFrame({
  'count': df.value_counts(sort=False),
  'normcount': df.groupby('store').value_counts(normalize=True, sort=False)
})

赞(0）回复(0）举报 2023-03-28

我来回答

当使用pandas对数据框进行分组时，应用带参数的多个函数

1条答案

相关问题

热门标签

最新问答