改变两个字符串列的形状，使计数之间的Pandas

pbossiut 于 2022-11-20 发布在其他

关注(0)|答案(4)|浏览(140)

我有两列，我想改变表格的形状，以便进行交叉计数。我如何通过Pandas实现这一点？

data = {
    "fruits": ["orange, apple, banana", "orange, apple, banana", 
               "apple, banana", "orange, apple, banana", "others"],
    "places": ["New York, London, Boston", "New York, Manchester", 
               "Tokyo", "Hong Kong, Boston", "London"],
}
df = pd.DataFrame(data)

                   fruits                    places
0   orange, apple, banana  New York, London, Boston
1   orange, apple, banana      New York, Manchester
2           apple, banana                     Tokyo
3   orange, apple, banana         Hong Kong, Boston
4                  others                    London

预期输出：

New York  London  Boston  Hong Kong   Manchester  Tokyo
orange  2          2       2        1            1        0
apple   2          1       2        1            1        1
banana  2          1       2        1            1        1
others  0          1       0        0            0        0

pandas

来源：https://stackoverflow.com/questions/69462940/reshape-two-string-columns-to-make-count-inbetween-in-pandas

4条答案

按热度按时间

wgx48brx1#

让我们按步骤进行：

df2 = df.copy()
df2["fruits"] = df["fruits"].str.split(", ")
df2["places"] = df["places"].str.split(", ")
df2

df3 = df2.explode("fruits").explode("places")
df3.head()

pd.pivot_table(df3, index="fruits", columns="places", aggfunc=len, fill_value=0)
# Or, using crosstab: 
# pd.crosstab(df3["fruits"], df3["places"])

它是留给读者的一个练习，把所有这些步骤放在一起：）

赞(0）回复(0）举报 2022-11-20

2vuwiymt2#

您可以在分割/分解的栏上使用pandas.crosstab：

df2 = (df.apply(lambda c: c.str.split(', ')) # split all columns
         .explode('fruit').explode('places') # explode to new rows
       )
pd.crosstab(df2['fruit'], df2['places'])     # compute crosstab

输出：

places  Boston  Hong Kong  London  Manchester  New York  Tokyo
fruit                                                         
apple        2          1       1           1         2      1
banana       2          1       1           1         2      1
orange       2          1       1           1         2      0
others       0          0       1           0         0      0

赞(0）回复(0）举报 2022-11-20

l2osamch3#

一种方法是使用itertools.product创建笛卡尔积，然后使用pd.Series.explode和pd.crosstab

from itertools import product
f = lambda x: list(product(x['places'].split(','), x['fruit'].split(',')))
df['fruit_places'] = df.apply(f, axis=1)
ddf = pd.DataFrame.from_records(df['fruit_places'].explode().values, columns=['places', 'fruit'])

 pd.crosstab(ddf['fruit'], ddf['places'])

赞(0）回复(0）举报 2022-11-20

xriantvc4#

def function1(dd:pd.DataFrame):
    return pd.crosstab(dd.fruits,dd.places)
df.applymap(lambda x:x.split(',')).explode(column=['fruits', 'places']).pipe(function1)

places    Boston   London   Manchester  Hong Kong  London  New York  Tokyo
fruits                                                                    
 apple         0        1            0          0       0         0      0
 banana        2        0            1          0       0         0      0
apple          0        0            0          1       0         1      0
banana         0        0            0          0       0         0      1
orange         0        0            0          0       0         1      0
others         0        0            0          0       1         0      0

赞(0）回复(0）举报 2022-11-20

我来回答

改变两个字符串列的形状，使计数之间的Pandas

4条答案

相关问题

热门标签

最新问答