Pandas将带逗号的值转换为新值

hmae6n7t  于 2023-03-28  发布在  其他
关注(0)|答案(3)|浏览(144)

我在Pandasdf中有一个专栏,看起来像这样:

specialty
0 1
1 2,5
2 2
3 6
4 missing
5 1
6 3
7 1,3,4
8 4
9 1

我想将所有具有多个值的值转换为7,并将所有“missing”转换为6。我知道我可以做到df['specialty'].replace({'missing':6})。但不确定将多个值转换为7输出如下:

specialty
0 1
1 7
2 2
3 6
4 6
5 1
6 3
7 7
8 4
9 1

我已经尝试了df['specialty'].str.contains(',') = 7,但这给了

SyntaxError: cannot assign to function call
c3frrgcw

c3frrgcw1#

使用numpy.wherenumpy.select

df['specialty'] = np.select([df['specialty'].str.contains(','),
                             df['specialty'].eq('missing')], [7, 6], 
                             default=df['specialty'])
df['specialty'] = np.where(df['specialty'].str.contains(','), 
                           7, 
                           df['specialty'].replace({'missing':6}))

print (df)
  specialty
0         1
1         7
2         2
3         6
4         6
5         1
6         3
7         7
8         4
9         1
7rfyedvj

7rfyedvj2#

loc使用布尔索引:

df.loc[df['specialty'].str.contains(','), 'specialty'] = 7
df.loc[df['specialty'].eq('missing'), 'specialty'] = 6

输出:

specialty
0         1
1         7
2         2
3         6
4         6
5         1
6         3
7         7
8         4
9         1

您可能还想从真实的值(2+5 -〉7)计算:

df['specialty'] = [6 if s=='missing' else sum(map(int, s.split(',')))
                   for s in df['specialty']]

或者:

df['specialty'] = (df['specialty'].replace({'missing': '6'})
                   .str.split(',', expand=True).astype(float)
                   .sum(axis=1).astype(int)
                  )
flseospp

flseospp3#

您可以用途:

df['spe2'] = (df['specialty'].astype(str).replace({'missing': '6'})
                             .str.split(',').explode().astype(int)
                             .groupby(level=0).sum())
print(df)

# Output
  specialty  spe2
0         1     1
1       2,5     7
2         2     2
3         6     6
4   missing     6
5         1     1
6         3     3
7     1,3,4     8
8         4     4
9         1     1

相关问题