改变两个字符串列的形状,使计数之间的Pandas

pbossiut  于 2022-11-20  发布在  其他
关注(0)|答案(4)|浏览(139)

我有两列,我想改变表格的形状,以便进行交叉计数。我如何通过Pandas实现这一点?

data = {
    "fruits": ["orange, apple, banana", "orange, apple, banana", 
               "apple, banana", "orange, apple, banana", "others"],
    "places": ["New York, London, Boston", "New York, Manchester", 
               "Tokyo", "Hong Kong, Boston", "London"],
}
df = pd.DataFrame(data)

                   fruits                    places
0   orange, apple, banana  New York, London, Boston
1   orange, apple, banana      New York, Manchester
2           apple, banana                     Tokyo
3   orange, apple, banana         Hong Kong, Boston
4                  others                    London

预期输出:

New York  London  Boston  Hong Kong   Manchester  Tokyo
orange  2          2       2        1            1        0
apple   2          1       2        1            1        1
banana  2          1       2        1            1        1
others  0          1       0        0            0        0
wgx48brx

wgx48brx1#

让我们按步骤进行:

df2 = df.copy()
df2["fruits"] = df["fruits"].str.split(", ")
df2["places"] = df["places"].str.split(", ")
df2

df3 = df2.explode("fruits").explode("places")
df3.head()

pd.pivot_table(df3, index="fruits", columns="places", aggfunc=len, fill_value=0)
# Or, using crosstab: 
# pd.crosstab(df3["fruits"], df3["places"])

它是留给读者的一个练习,把所有这些步骤放在一起:)

2vuwiymt

2vuwiymt2#

您可以在分割/分解的栏上使用pandas.crosstab

df2 = (df.apply(lambda c: c.str.split(', ')) # split all columns
         .explode('fruit').explode('places') # explode to new rows
       )
pd.crosstab(df2['fruit'], df2['places'])     # compute crosstab

输出:

places  Boston  Hong Kong  London  Manchester  New York  Tokyo
fruit                                                         
apple        2          1       1           1         2      1
banana       2          1       1           1         2      1
orange       2          1       1           1         2      0
others       0          0       1           0         0      0
l2osamch

l2osamch3#

一种方法是使用itertools.product创建笛卡尔积,然后使用pd.Series.explodepd.crosstab

from itertools import product
f = lambda x: list(product(x['places'].split(','), x['fruit'].split(',')))
df['fruit_places'] = df.apply(f, axis=1)
ddf = pd.DataFrame.from_records(df['fruit_places'].explode().values, columns=['places', 'fruit'])

 pd.crosstab(ddf['fruit'], ddf['places'])
xriantvc

xriantvc4#

def function1(dd:pd.DataFrame):
    return pd.crosstab(dd.fruits,dd.places)
df.applymap(lambda x:x.split(',')).explode(column=['fruits', 'places']).pipe(function1)

places    Boston   London   Manchester  Hong Kong  London  New York  Tokyo
fruits                                                                    
 apple         0        1            0          0       0         0      0
 banana        2        0            1          0       0         0      0
apple          0        0            0          1       0         1      0
banana         0        0            0          0       0         0      1
orange         0        0            0          0       0         1      0
others         0        0            0          0       1         0      0

相关问题