python 如何从列中删除特定单词?

llycmphe  于 2022-10-30  发布在  Python
关注(0)|答案(2)|浏览(203)

我需要从distance列中删除“km”:

[Distance]
0     114 km
1     114 km
2     9.1 km
3    33.1 km
4     182 km
5    93.2 km
6    40.4 km
7        0.0
8        0.0
9    43.4 km
Name: distance, dtype: object

必须是这样的:

[Distance]
0     114
1     114
2     9.1
3    33.1
4     182
5    93.2
6    40.4
7        
8        
9    43.4
km0tfn4u

km0tfn4u1#

假设要删除的尾随子字符串始终为km,则可以用途:

df['distance'] = df['distance'].str.replace(r'\s*km$', '', regex=True)

一个更通用的方法是提取数字:

df['distance'] = df['distance'].str.extract(r'(\d+(?:\.\d+)?)')

如果您只想要有“km”时的数字:

df['distance'] = df['distance'].str.extract(r'(\d+(?:\.\d+)?)\s*km')

并转换为数字/NaN:

df['distance'] = pd.to_numeric(df['distance'].str.extract(r'(\d+(?:\.\d+)?)\s*km', expand=False), errors='coerce')
摘要
df['distance1'] = df['distance'].str.replace(r'\s*km$', '', regex=True)

df['distance2'] = df['distance'].str.extract(r'(\d+(?:\.\d+)?)')

df['distance3'] = df['distance'].str.extract(r'(\d+(?:\.\d+)?)\s*km')

df['distance4'] = pd.to_numeric(df['distance'].str.extract(r'(\d+(?:\.\d+)?)\s*km', expand=False), errors='coerce')

print(df.dtypes)

print(df)

输出量:

distance      object
distance1     object
distance2     object
distance3     object
distance4    float64
dtype: object

  distance distance1 distance2 distance3  distance4
0   114 km       114       114       114      114.0
1   114 km       114       114       114      114.0
2   9.1 km       9.1       9.1       9.1        9.1
3  33.1 km      33.1      33.1      33.1       33.1
4   182 km       182       182       182      182.0
5  93.2 km      93.2      93.2      93.2       93.2
6  40.4 km      40.4      40.4      40.4       40.4
7      0.0       0.0       0.0       NaN        NaN
8      0.0       0.0       0.0       NaN        NaN
9  43.4 km      43.4      43.4      43.4       43.4
7fyelxc5

7fyelxc52#

下面是另一种方法,只需删除带有0的观测值并删除“km”:

df['distance'] = df['distance'].str.replace(r'\D+', '').astype('float')

# r'\D+' removes any character that is not a digit

df['distance'] = df['distance'].replace(0, np.nan)
df['distance'].dropna(inplace=True)

相关问题