我需要从distance列中删除“km”:
distance
[Distance]0 114 km1 114 km2 9.1 km3 33.1 km4 182 km5 93.2 km6 40.4 km7 0.08 0.09 43.4 kmName: distance, dtype: object
[Distance]
0 114 km
1 114 km
2 9.1 km
3 33.1 km
4 182 km
5 93.2 km
6 40.4 km
7 0.0
8 0.0
9 43.4 km
Name: distance, dtype: object
必须是这样的:
[Distance]0 1141 1142 9.13 33.14 1825 93.26 40.47 8 9 43.4
0 114
1 114
2 9.1
3 33.1
4 182
5 93.2
6 40.4
7
8
9 43.4
km0tfn4u1#
假设要删除的尾随子字符串始终为km,则可以用途:
km
df['distance'] = df['distance'].str.replace(r'\s*km$', '', regex=True)
一个更通用的方法是提取数字:
df['distance'] = df['distance'].str.extract(r'(\d+(?:\.\d+)?)')
如果您只想要有“km”时的数字:
df['distance'] = df['distance'].str.extract(r'(\d+(?:\.\d+)?)\s*km')
并转换为数字/NaN:
df['distance'] = pd.to_numeric(df['distance'].str.extract(r'(\d+(?:\.\d+)?)\s*km', expand=False), errors='coerce')
df['distance1'] = df['distance'].str.replace(r'\s*km$', '', regex=True)df['distance2'] = df['distance'].str.extract(r'(\d+(?:\.\d+)?)')df['distance3'] = df['distance'].str.extract(r'(\d+(?:\.\d+)?)\s*km')df['distance4'] = pd.to_numeric(df['distance'].str.extract(r'(\d+(?:\.\d+)?)\s*km', expand=False), errors='coerce')print(df.dtypes)print(df)
df['distance1'] = df['distance'].str.replace(r'\s*km$', '', regex=True)
df['distance2'] = df['distance'].str.extract(r'(\d+(?:\.\d+)?)')
df['distance3'] = df['distance'].str.extract(r'(\d+(?:\.\d+)?)\s*km')
df['distance4'] = pd.to_numeric(df['distance'].str.extract(r'(\d+(?:\.\d+)?)\s*km', expand=False), errors='coerce')
print(df.dtypes)
print(df)
输出量:
distance objectdistance1 objectdistance2 objectdistance3 objectdistance4 float64dtype: object distance distance1 distance2 distance3 distance40 114 km 114 114 114 114.01 114 km 114 114 114 114.02 9.1 km 9.1 9.1 9.1 9.13 33.1 km 33.1 33.1 33.1 33.14 182 km 182 182 182 182.05 93.2 km 93.2 93.2 93.2 93.26 40.4 km 40.4 40.4 40.4 40.47 0.0 0.0 0.0 NaN NaN8 0.0 0.0 0.0 NaN NaN9 43.4 km 43.4 43.4 43.4 43.4
distance object
distance1 object
distance2 object
distance3 object
distance4 float64
dtype: object
distance distance1 distance2 distance3 distance4
0 114 km 114 114 114 114.0
1 114 km 114 114 114 114.0
2 9.1 km 9.1 9.1 9.1 9.1
3 33.1 km 33.1 33.1 33.1 33.1
4 182 km 182 182 182 182.0
5 93.2 km 93.2 93.2 93.2 93.2
6 40.4 km 40.4 40.4 40.4 40.4
7 0.0 0.0 0.0 NaN NaN
8 0.0 0.0 0.0 NaN NaN
9 43.4 km 43.4 43.4 43.4 43.4
7fyelxc52#
下面是另一种方法,只需删除带有0的观测值并删除“km”:
df['distance'] = df['distance'].str.replace(r'\D+', '').astype('float')# r'\D+' removes any character that is not a digitdf['distance'] = df['distance'].replace(0, np.nan)df['distance'].dropna(inplace=True)
df['distance'] = df['distance'].str.replace(r'\D+', '').astype('float')
# r'\D+' removes any character that is not a digit
df['distance'] = df['distance'].replace(0, np.nan)
df['distance'].dropna(inplace=True)
2条答案
按热度按时间km0tfn4u1#
假设要删除的尾随子字符串始终为
km
,则可以用途:一个更通用的方法是提取数字:
如果您只想要有“km”时的数字:
并转换为数字/NaN:
摘要
输出量:
7fyelxc52#
下面是另一种方法,只需删除带有0的观测值并删除“km”: