我有两列,我想改变表格的形状,以便进行交叉计数。我如何通过Pandas实现这一点?
data = {
"fruits": ["orange, apple, banana", "orange, apple, banana",
"apple, banana", "orange, apple, banana", "others"],
"places": ["New York, London, Boston", "New York, Manchester",
"Tokyo", "Hong Kong, Boston", "London"],
}
df = pd.DataFrame(data)
fruits places
0 orange, apple, banana New York, London, Boston
1 orange, apple, banana New York, Manchester
2 apple, banana Tokyo
3 orange, apple, banana Hong Kong, Boston
4 others London
预期输出:
New York London Boston Hong Kong Manchester Tokyo
orange 2 2 2 1 1 0
apple 2 1 2 1 1 1
banana 2 1 2 1 1 1
others 0 1 0 0 0 0
4条答案
按热度按时间wgx48brx1#
让我们按步骤进行:
它是留给读者的一个练习,把所有这些步骤放在一起:)
2vuwiymt2#
您可以在分割/分解的栏上使用
pandas.crosstab
:输出:
l2osamch3#
一种方法是使用
itertools.product
创建笛卡尔积,然后使用pd.Series.explode
和pd.crosstab
xriantvc4#