基于Pandas Dataframe 检索子字符串

bogh5gae 于 2022-12-28 发布在其他

关注(0)|答案(1)|浏览(145)

我有以下Pandasdfs：

print(df)

text_description     
ROME AND MILAN ARE AMAZING CITIES
NEW YORK AND LONDON REPRESENT GLOBAL FINANCE MARKETS
I LOVE MADRID 
BANGKOK IS AN AMAZING CITY
VAL D'ISERE IS A MAGIC PLACE

...

print(df_1)

City_List

PARIS
MILAN
ROME
NEW YORK
LONDON
MADRID
V. D'ISERE

我想过滤掉df[“text_description”]中的文本，只保留df_1[“City_List”]中包含的城市名称，从而获得两个单独的列：

print(final_df)

text_description_0     text_description_1
ROME                          MILAN
NEW YORK                     LONDON
MADRID                         na
VAL D'ISERE                    na
...

如何创建“final_df”？

pandas

来源：https://stackoverflow.com/questions/61682984/retrieve-substrings-based-on-pandas-dataframes

1条答案

按热度按时间

6pp0gazn1#

你不会得到瓦尔D'ISERE，因为它并不存在于城市列表中。它有一个缩写，但程序无法识别它。你必须找到一种方法来解释缩写。下面的代码只处理在两列中找到的精确单词：

from itertools import product
from collections import defaultdict
d = defaultdict(list)
#create a cross Cartesian of the two columns
#and keep only values where City list can be found in text description
for first,last in product(df1.text_description,df2.City_List):
    if last in first:
        d[first].append(last)

d = {k:','.join(v) for k,v in d.items()}

#map the dictionary to text description and create two columns
df1[['city1','city2']] = df1.text_description.map(d).str.split(',',expand=True)

df1
         text_description                               city1       city2
0   ROME AND MILAN ARE AMAZING CITIES                   MILAN       ROME
1   NEW YORK AND LONDON REPRESENT GLOBAL FINANCE M...   NEW YORK    LONDON
2   I LOVE MADRID                                       MADRID      None
3   BANGKOK IS AN AMAZING CITY                          NaN         NaN
4   VAL D'ISERE IS A MAGIC PLACE                        NaN         NaN

赞(0）回复(0）举报 2022-12-28

我来回答

基于Pandas Dataframe 检索子字符串

1条答案

相关问题

热门标签

最新问答