numpy 如何在DataFrame.describe中考虑权重?[副本]

wwwo4jvm  于 2023-08-05  发布在  其他
关注(0)|答案(1)|浏览(90)

此问题在此处已有答案

pandas describe by - additional parameters(3个答案)
22天前关闭
我有这样一个样本,学生的分数和人口的分数:

  1. # Create the DataFrame
  2. sample = pd.DataFrame(
  3. {'score':[595, 594, 593, 592, 591, 590, 589, 588, 587, 586, 585, 584, 583,582, 581, 580, 579, 578, 577, 576],
  4. 'population':[ 705, 745, 716, 742, 722, 746, 796, 750, 816, 809, 815,821, 820, 865, 876, 886, 947, 949, 1018, 967]})

字符串
我计算它的加权平均分数:

  1. np.average(sample['score'], weights=sample['population'])
  2. # 584.9062443219672


然而,当我运行sample.describe()时,它没有考虑权重:

  1. sample.describe()
  2. score population
  3. count 20.00000 20.000000
  4. mean 585.50000 825.550000
  5. std 5.91608 91.465539
  6. min 576.00000 705.000000
  7. 25% 580.75000 745.750000
  8. 50% 585.50000 815.500000
  9. 75% 590.25000 878.500000
  10. max 595.00000 1018.000000


如何获取sample.describe()中包含的权重?

vmdwslir

vmdwslir1#

你需要自定义函数,因为输出是标量,在所有列中获得相同的值:

  1. def describe(df, stats):
  2. d = df.describe()
  3. d.loc[stats] = np.average(df['score'], weights=df['population'])
  4. return d
  5. out = describe(sample, 'wa')
  6. print (out)
  7. score population
  8. count 20.000000 20.000000
  9. mean 585.500000 825.550000
  10. std 5.916080 91.465539
  11. min 576.000000 705.000000
  12. 25% 580.750000 745.750000
  13. 50% 585.500000 815.500000
  14. 75% 590.250000 878.500000
  15. max 595.000000 1018.000000
  16. wa 584.906244 584.906244

字符串

展开查看全部

相关问题