Groupby Pandas数据框并计算一列的平均值和标准差

bqujaahr 于 2023-01-28 发布在其他

关注(0)|答案(2)|浏览(250)

我有一个Pandas数据框如下：

a      b      c      d
0  Apple  3      5      7
1  Banana 4      4      8
2  Cherry 7      1      3
3  Apple  3      4      7

我想按列“a”对行进行分组，同时用分组行中的平均值替换列“c”中的值，并添加另一列，该列的平均值已计算为列“c”中的值的标准差。列“b”或“d”中的值对于分组的所有行都是常数。因此，所需的输出将是：

a      b      c      d      e
0  Apple  3      4.5    7      0.707107
1  Banana 4      4      8      0
2  Cherry 7      1      3      0

实现这一目标的最佳途径是什么？

pandas

来源：https://stackoverflow.com/questions/26599347/groupby-pandas-dataframe-and-calculate-mean-and-stdev-of-one-column

2条答案

按热度按时间

p8h8hvxi1#

您可以使用groupby-agg操作：

In [38]: result = df.groupby(['a'], as_index=False).agg(
                      {'c':['mean','std'],'b':'first', 'd':'first'})

然后重命名并重新排序列：

In [39]: result.columns = ['a','c','e','b','d']
In [40]: result.reindex(columns=sorted(result.columns))
Out[40]: 
        a  b    c  d         e
0   Apple  3  4.5  7  0.707107
1  Banana  4  4.0  8       NaN
2  Cherry  7  1.0  3       NaN

默认情况下，Pandas计算样本标准差。要计算总体标准差：

def pop_std(x):
    return x.std(ddof=0)
result = df.groupby(['a'], as_index=False).agg({'c':['mean',pop_std],'b':'first', 'd':'first'})
result.columns = ['a','c','e','b','d']
result.reindex(columns=sorted(result.columns))

收益率

a  b    c  d    e
0   Apple  3  4.5  7  0.5
1  Banana  4  4.0  8  0.0
2  Cherry  7  1.0  3  0.0

展开查看全部

赞(0）回复(0）举报 2023-01-28

lokaqttq2#

如果某些列中的值对于分组的所有行都是常量（例如OP中的“b”、“d”），则可以将其包含到分组器中，并在以后重新排序列。

new_df = (
    df.groupby(['a', 'b', 'd'])['c'].agg(['mean', 'std'])   # groupby operation
    .set_axis(['c', 'e'], axis=1)                           # rename columns
    .reset_index()                                          # make groupers into columns
    [['a', 'b', 'c', 'd', 'e']]                             # reorder columns
)

您还可以使用命名聚合使groupby结果具有自定义列名。mean列命名为'c'，std列命名为groupby.agg末尾的'e'。

new_df = (
    df.groupby(['a', 'b', 'd'])['c'].agg([('c', 'mean'), ('e', 'std')])
    .reset_index()                                          # make groupers into columns
    [['a', 'b', 'c', 'd', 'e']]                             # reorder columns
)

您也可以将参数传递给groupby.agg。例如，如果您需要在groupby.agg中将ddof=0传递给std()，则可以使用lambda来完成此操作。

new_df = (
    df.groupby(['a', 'b', 'd'])['c'].agg([('c', 'mean'), ('e', lambda g: g.std(ddof=0))])
    .reset_index()[['a', 'b', 'c', 'd', 'e']]
)

展开查看全部

赞(0）回复(0）举报 2023-01-28

我来回答

Groupby Pandas数据框并计算一列的平均值和标准差

2条答案

相关问题

热门标签

最新问答