如何使用百分比创建pandas交叉表？

oiopk7p5 于 2023-11-15 发布在其他

关注(0)|答案(6)|浏览(102)

给定一个包含不同分类变量的矩阵，我如何返回一个包含百分比而不是频率的交叉表？

df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
                   'B' : ['A', 'B', 'C'] * 8,
                   'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
                   'D' : np.random.randn(24),
                   'E' : np.random.randn(24)})

pd.crosstab(df.A,df.B)

B       A    B    C
A               
one     4    4    4
three   2    2    2
two     2    2    2

字符串
预期产出：

B       A     B    C
A               
one     .33  .33  .33
three   .33  .33  .33
two     .33  .33  .33

型

pandas

来源：https://stackoverflow.com/questions/21247203/how-to-make-a-pandas-crosstab-with-percentages

6条答案

按热度按时间

i5desfxk1#

从Pandas 0.18.1开始，有一个normalize选项：

In [1]: pd.crosstab(df.A,df.B, normalize='index')
Out[1]:

B              A           B           C
A           
one     0.333333    0.333333    0.333333
three   0.333333    0.333333    0.333333
two     0.333333    0.333333    0.333333

字符串
在这里，您可以在all，index（行）或columns上进行标准化。
更多详情请访问in the documentation。

赞(0）回复(0）举报 2023-11-15

ubof19bj2#

pd.crosstab(df.A, df.B).apply(lambda r: r/r.sum(), axis=1)

字符串
基本上，您只需使用执行row/row.sum()的函数，并使用apply和axis=1按行应用它。
(If在Python 2中这样做，你应该使用from __future__ import division来确保除法总是返回浮点数。

赞(0）回复(0）举报 2023-11-15

zxlwwiss3#

我们可以通过乘以100来表示百分比：

pd.crosstab(df.A,df.B, normalize='index')\
    .round(4)*100

B          A      B      C
A                         
one    33.33  33.33  33.33
three  33.33  33.33  33.33
two    33.33  33.33  33.33

字符串
为了方便我把它围起来了。

赞(0）回复(0）举报 2023-11-15

wwtsj6pe4#

如果你想得到总数的百分比，你可以除以df的len而不是行和：

pd.crosstab(df.A, df.B).apply(lambda r: r/len(df), axis=1)

字符串

赞(0）回复(0）举报 2023-11-15

6ljaweal5#

规范化索引将很简单。在pd.crosstab()中使用参数normalize = "index"。

赞(0）回复(0）举报 2023-11-15

4uqofj5v6#

另一种选择是使用div而不是apply：

In [11]: res = pd.crosstab(df.A, df.B)

字符串
除以索引上的和：

In [12]: res.sum(axis=1)
Out[12]: 
A
one      12
three     6
two       6
dtype: int64

型

与上面类似，你需要做一些关于整数除法的事情（我使用astype（'float'））：*

In [13]: res.astype('float').div(res.sum(axis=1), axis=0)
Out[13]: 
B             A         B         C
A                                  
one    0.333333  0.333333  0.333333
three  0.333333  0.333333  0.333333
two    0.333333  0.333333  0.333333

型

赞(0）回复(0）举报 2023-11-15

我来回答

如何使用百分比创建pandas交叉表？

6条答案

相关问题

热门标签

最新问答