Pandas中的缺失数据,交叉表

sdnqo3pr 于 2023-01-24 发布在其他

关注(0)|答案(2)|浏览(155)

我在用Pandas做交叉统计：

a = np.array(['foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'foo'], dtype=object)
b = np.array(['one', 'one', 'two', 'one', 'two', 'two', 'two'], dtype=object)
c = np.array(['dull', 'dull', 'dull', 'dull', 'dull', 'shiny', 'shiny'], dtype=object)

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])

b     one   two       
c    dull  dull  shiny
a                     
bar     1     1      0
foo     2     1      2

但我真正想要的是以下几点：

b     one        two       
c    dull  shiny dull  shiny
a                     
bar     1     0    1      0
foo     2     0    1      2

我找到了一个变通办法，添加新列并将级别设置为新的多索引，但这似乎很困难...
有没有办法将MultiIndex传递给crosstabs函数以预定义输出列？

pandas

来源：https://stackoverflow.com/questions/17003034/missing-data-in-pandas-crosstab

2条答案

按热度按时间

6g8kf2rb1#

crosstab函数有一个名为dropna的参数，默认设置为True。此参数定义是否应显示空列（如一个 Flink 列）。
我试着这样称呼这个函数：

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'], dropna = False)

这是我得到的

b     one          two       
c    dull  shiny  dull  shiny
a                            
bar     1      0     1      0
foo     2      0     1      2

希望那还是有用的。

赞(0）回复(0）举报 2023-01-24

h7wcgrx32#

我不认为有办法做到这一点，并且crosstab在源代码中调用pivot_table，而源代码似乎也没有提供这一点。* 我将其作为一个问题here提出。*
一个蹩脚的变通方法（可能与您已经使用的方法相同，也可能不同...）：

from itertools import product
ct = pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
a_x_b = list(product(np.unique(b), np.unique(c)))
a_x_b = pd.MultiIndex.from_tuples(a_x_b)

In [15]: ct.reindex_axis(a_x_b, axis=1).fillna(0)
Out[15]:
      one          two
     dull  shiny  dull  shiny
a
bar     1      0     1      0
foo     2      0     1      2

如果product太慢，下面是它的a numpy implementation。*

赞(0）回复(0）举报 2023-01-24

我来回答

Pandas中的缺失数据,交叉表

2条答案

相关问题

热门标签

最新问答