scipy 如何计算按索引分组的 Dataframe 列中数字的百分位值

kb5ga3dv 于 2022-11-10 发布在其他

关注(0)|答案(2)|浏览(168)

我有一个 Dataframe ，如下所示：

对于数据中的每一组，我想找出得分35的百分位值。（即35适合分组数据的百分位）
我试过不同的方法，但都不管用。

scipy.stats.percentileofscore(df['Score], 35, kind='weak')
 --> This is working but this doesn't give me the percentile grouped by index

df.groupby('group')['Score].percentileofscore()
 --> 'SeriesGroupBy' object has no attribute 'percentileofscore'

scipy.stats.percentileofscore(df.groupby('group')[['Score]], 35, kind='strict')
 --> TypeError: '<' not supported between instances of 'str' and 'int'

我的理想输出如下所示：

df:
        Score Percentile
group 
  A       50
  C       33

有谁能给我建议一下什么在这里很好用吗？

scipy

来源：https://stackoverflow.com/questions/73871161/how-to-calculate-percentile-value-of-number-in-dataframe-column-grouped-by-index

2条答案

按热度按时间

m1m5dgzv1#

序列在X点的反分位数函数是序列中小于X的值的比例，对吗？所以：

In [158]: df["Score"].lt(35).groupby(df["group"]).mean().mul(100)
Out[158]:
group
A    50.000000
C    33.333333
Name: Score, dtype: float64

在“分数”上获得是否〈35的真/假系列
通过“group”将此系列分组
取平均值
因为True == 1和False == 0，它将有效地给予比例！
mul乘以100以获得百分比

赞(0）回复(0）举报 2022-11-10

6ovsh4lw2#

为了以更通用的方式回答这个问题，您需要在组上进行自定义聚合，panda允许您使用agg方法来完成。
您可以自行定义函数，也可以使用程式库中的函数：

def percentileofscore(ser: pd.Series) -> float:
    return 100 * (ser > 35).sum() / ser.size

df.groupby("group").agg(percentileofscore)

输出量：

Score
group   
A     50.000000
C     33.333333

赞(0）回复(0）举报 2022-11-10

我来回答

scipy 如何计算按索引分组的 Dataframe 列中数字的百分位值

2条答案

相关问题

热门标签

最新问答