两种不同pandas Dataframe 和列中的常用词

fnx2tebb  于 2023-05-21  发布在  其他
关注(0)|答案(2)|浏览(93)

A
| x|圆盘|
| --------------|--------------|
| a| 'tall','short','medium'|
| B| '小','长','短'|
B
| y|
| --------------|
| '高','矮'|
| '短','长'|
| '小','高'|
输出如-
| x|圆盘|高矮|短长|
| --------------|--------------|--------------|--------------|
| a| 'tall','short','medium'| 1| 0|
| B| '小','长','短'| 0| 1|

hec6srdp

hec6srdp1#

将值转换为集合,并使用集合新列查找常用词:

for x in B['y']:
    s = set(x.split(', '))
    A[x] = [int(set(y.split(', ')) >= s) for y in A['disc']]

如有必要,仅删除0列,添加:

out = A.loc[:, A.ne(0).any()]
6kkfgxo0

6kkfgxo02#

你可以使用set comparison和numpy broadcasting:

out = A.join(pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
                           >= B['y'].apply(set).to_numpy()).astype(int),
                          columns=B['y'].apply(' '.join), index=A.index)
             )

输出:

x                   disc  tall short  short long  small tall
0  a  [tall, short, medium]           1           0           0
1  b   [small, long, short]           0           1           0

如果您只需要匹配:

tmp = pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
                     >= B['y'].apply(set).to_numpy()),
                    columns=B['y'].apply(' '.join), index=A.index)
                   
out = A.join(tmp.loc[:, tmp.any()].astype(int))

输出:

x                   disc  tall short  short long
0  a  [tall, short, medium]           1           0
1  b   [small, long, short]           0           1

相关问题