如何使用python清理填充名称的数据框列?

crcmnpdw  于 2021-09-08  发布在  Java
关注(0)|答案(1)|浏览(267)

我有以下 Dataframe :

df = pd.DataFrame( columns = ['Name']) 
df['Name'] = ['Aadam','adam','AdAm','adammm','Adam.','Bethh','beth.','beht','Beeth','Beth']

我想清洁色谱柱,以实现以下目的:

df['Name Corrected'] = ['adam','adam','adam','adam','adam','beth','beth','beth','beth','beth']
df

清理后的名称基于以下参考表:

ref = pd.DataFrame( columns = ['Cleaned Names']) 
ref['Cleaned Names'] = ['adam','beth']

我知道模糊匹配,但我不确定这是否是解决问题的最有效方法。

fykwrbwg

fykwrbwg1#

您可以尝试:

lst=['adam','beth']
out=pd.concat([df['Name'].str.contains(x,case=False).map({True:x})  for x in lst],axis=1)
df['Name corrected']=out.bfill(axis=1).iloc[:,0]

# Finally:

df['Name corrected']=df['Name corrected'].ffill()

# but In certain condition ffill() gives you wrong values

解释:

lst=['adam','beth']

# created a list of words

out=pd.concat([df['Name'].str.contains(x,case=False).map({True:x})  for x in lst],axis=1)

# checking If the 'Name' column contain the word one at a time that are inside the list and that will give a boolean series of True and False and then we are mapping The value of that particular element that is inside list so True becomes that value and False become NaN and then we are concatinating both list of Series on axis=1 so that It becomes a Dataframe

df['Name corrected']=out.bfill(axis=1).iloc[:,0]

# Backword filling values on axis=1 and getting the 1st column

# Finally:

df['Name corrected']=df['Name corrected'].ffill()

# Forward filling the missing values

相关问题