numpy 如何在DataFrame.describe中考虑权重？[副本]

wwwo4jvm 于 2023-08-05 发布在其他

关注(0)|答案(1)|浏览(90)

此问题在此处已有答案：

pandas describe by - additional parameters（3个答案）
22天前关闭
我有这样一个样本，学生的分数和人口的分数：

# Create the DataFrame
sample = pd.DataFrame(
{'score':[595, 594, 593, 592, 591, 590, 589, 588, 587, 586, 585, 584, 583,582, 581, 580, 579, 578, 577, 576], 
'population':[ 705,  745,  716,  742,  722,  746,  796,  750,  816,  809,  815,821,  820,  865,  876,  886,  947,  949, 1018,  967]})

字符串
我计算它的加权平均分数：

np.average(sample['score'], weights=sample['population'])
# 584.9062443219672

型
然而，当我运行sample.describe（）时，它没有考虑权重：

sample.describe()
           score   population
count   20.00000    20.000000
mean   585.50000   825.550000
std      5.91608    91.465539
min    576.00000   705.000000
25%    580.75000   745.750000
50%    585.50000   815.500000
75%    590.25000   878.500000
max    595.00000  1018.000000

型
如何获取sample.describe（）中包含的权重？

numpy

来源：https://stackoverflow.com/questions/76678275/how-could-get-weights-considered-in-dataframe-describe

1条答案

按热度按时间

vmdwslir1#

你需要自定义函数，因为输出是标量，在所有列中获得相同的值：

def describe(df, stats):
    d = df.describe()
    d.loc[stats] = np.average(df['score'], weights=df['population'])
    return d
out = describe(sample, 'wa')
print (out)
            score   population
count   20.000000    20.000000
mean   585.500000   825.550000
std      5.916080    91.465539
min    576.000000   705.000000
25%    580.750000   745.750000
50%    585.500000   815.500000
75%    590.250000   878.500000
max    595.000000  1018.000000
wa     584.906244   584.906244

字符串

展开查看全部

赞(0）回复(0）举报 2023-08-05

我来回答

numpy 如何在DataFrame.describe中考虑权重？[副本]

1条答案

相关问题

热门标签

最新问答