python-3.x 比较Pandasdf中的行并生成新列

hmtdttj4  于 2023-02-01  发布在  Python
关注(0)|答案(2)|浏览(116)

我有这个PandasDF

df = pd.DataFrame({'id': [1, 2, 3, 4, 5, 6], 'name': ['steve', 'joey', 'nikolaj', 'buddy', 'chester', 'mateo']})

我想比较name列中的每一行,并生成如下所示的df:

id    name    second_name    second_id
1     steve   joey           2
1     steve   nikolaj        3
1     steve   buddy          4
1     steve   chester        5
1     steve   mateo          6
2     joey    steve.         1
3.    joey.   nikolaj.       3

基本上,df中的每个唯一行在每次与该列中的每个其他行进行比较时都会重复。

bq3bfh9z

bq3bfh9z1#

一个简单的方法是交叉merge与过滤:

out = df.merge(df.add_prefix('second_'), how='cross').query('id != second_id')

输出:

id     name  second_id second_name
1    1    steve          2        joey
2    1    steve          3     nikolaj
3    1    steve          4       buddy
4    1    steve          5     chester
5    1    steve          6       mateo
6    2     joey          1       steve
8    2     joey          3     nikolaj
9    2     joey          4       buddy
10   2     joey          5     chester
11   2     joey          6       mateo
12   3  nikolaj          1       steve
13   3  nikolaj          2        joey
15   3  nikolaj          4       buddy
16   3  nikolaj          5     chester
17   3  nikolaj          6       mateo
18   4    buddy          1       steve
19   4    buddy          2        joey
20   4    buddy          3     nikolaj
22   4    buddy          5     chester
23   4    buddy          6       mateo
24   5  chester          1       steve
25   5  chester          2        joey
26   5  chester          3     nikolaj
27   5  chester          4       buddy
29   5  chester          6       mateo
30   6    mateo          1       steve
31   6    mateo          2        joey
32   6    mateo          3     nikolaj
33   6    mateo          4       buddy
34   6    mateo          5     chester

泛化

如果你想要一个更通用的方法,你可以生成组合/排列/等,并处理一个以上的组合。
2个组合的示例:

from itertools import combinations

pd.concat([df.loc[idx].add_suffix(f'_{i}').reset_index(drop=True)
           for i, idx in enumerate(map(list, zip(*combinations(range(len(df)), r=2))), start=1)],
          axis=1)

输出:

id_1   name_1  id_2   name_2
0      1    steve     2     joey
1      1    steve     3  nikolaj
2      1    steve     4    buddy
3      1    steve     5  chester
4      1    steve     6    mateo
5      2     joey     3  nikolaj
6      2     joey     4    buddy
7      2     joey     5  chester
8      2     joey     6    mateo
9      3  nikolaj     4    buddy
10     3  nikolaj     5  chester
11     3  nikolaj     6    mateo
12     4    buddy     5  chester
13     4    buddy     6    mateo
14     5  chester     6    mateo

3的排列:

from itertools import permutations

print(pd.concat([df.loc[idx].add_suffix(f'_{i}').reset_index(drop=True)
           for i, idx in enumerate(map(list, zip(*permutations(range(len(df)), r=3))), start=1)],
          axis=1))

输出:

id_1 name_1  id_2   name_2  id_3   name_3
0       1  steve     2     joey     3  nikolaj
1       1  steve     2     joey     4    buddy
2       1  steve     2     joey     5  chester
3       1  steve     2     joey     6    mateo
4       1  steve     3  nikolaj     2     joey
..    ...    ...   ...      ...   ...      ...
115     6  mateo     4    buddy     5  chester
116     6  mateo     5  chester     1    steve
117     6  mateo     5  chester     2     joey
118     6  mateo     5  chester     3  nikolaj
119     6  mateo     5  chester     4    buddy

[120 rows x 6 columns]
yh2wf1be

yh2wf1be2#

使用带suffixes参数的交叉联接并按DataFrame.loc筛选:

df  = (df.merge(df, how='cross', suffixes=('','_second'))
         .loc[lambda x: x['id'].ne(x['id_second'])])
print (df)
    id     name  id_second name_second
1    1    steve          2        joey
2    1    steve          3     nikolaj
3    1    steve          4       buddy
4    1    steve          5     chester
5    1    steve          6       mateo
6    2     joey          1       steve
8    2     joey          3     nikolaj
9    2     joey          4       buddy
10   2     joey          5     chester
11   2     joey          6       mateo
12   3  nikolaj          1       steve
13   3  nikolaj          2        joey
15   3  nikolaj          4       buddy
16   3  nikolaj          5     chester
17   3  nikolaj          6       mateo
18   4    buddy          1       steve
19   4    buddy          2        joey
20   4    buddy          3     nikolaj
22   4    buddy          5     chester
23   4    buddy          6       mateo
24   5  chester          1       steve
25   5  chester          2        joey
26   5  chester          3     nikolaj
27   5  chester          4       buddy
29   5  chester          6       mateo
30   6    mateo          1       steve
31   6    mateo          2        joey
32   6    mateo          3     nikolaj
33   6    mateo          4       buddy
34   6    mateo          5     chester

相关问题