删除pandas列中的标点符号，但保留原始列表结构

bnlyeluc 于 2023-04-10 发布在其他

关注(0)|答案(2)|浏览(102)

我知道如何在单元格中为单个列表执行此操作，但我需要保持多个列表的结构，如[["I","need","to","remove","punctuations","."],[...],[...]]-〉[["I","need","to","remove","punctuations"],[...],[...]]
我知道的所有方法都变成了这个-〉["I","need","to","remove","punctuations",...]

data["clean_text"] = data["clean_text"].apply(lambda x: [', '.join([c for c in s if c not in string.punctuation]) for s in x])
data["clean_text"] = data["clean_text"].str.replace(r'[^\w\s]+', '')
...

最好的办法是什么？

pandas

来源：https://stackoverflow.com/questions/75934785/remove-punctuations-from-pandas-column-but-keep-original-list-of-lists-structure

2条答案

按热度按时间

klr1opcd1#

按照你的方法，我只需要添加一个带有helper函数的 listcomp：

import string

def clean_up(lst):
    return [[w for w in sublist if w not in string.punctuation] for sublist in lst]

data["clean_text"] = [clean_up(x) for x in data["text"]]

输出：

print(data) # -- with two different columns so we can see the difference

                                                                                                    text  \
0  [[I, need, to, remove, punctuations, .], [This, is, another, list, with, commas, ,, and, periods, .]]   

                                                                                     clean_text  
0  [[I, need, to, remove, punctuations], [This, is, another, list, with, commas, and, periods]]

赞(0）回复(0）举报 2023-04-10

carvr3hs2#

如果你的dataframe不是很大，你可以尝试explode list to rows，然后过滤掉包含标点符号的行，最后group返回行。

df_ = df[['clean_text']].copy()

out = (df_.assign(g1=range(len(df)))
       .explode('clean_text', ignore_index=True)
       .explode('clean_text')
       .loc[lambda d: ~d['clean_text'].isin([',', '.'])]  # remove possible punctuation
       .groupby(level=0).agg({'clean_text': list, 'g1': 'first'})
       .groupby('g1').agg({'clean_text': list}))

print(df_)

                                                   clean_text
0  [[I, need, to, remove, punctuations, .], [Play, games, .]]

print(out)

                                             clean_text
g1
0   [[I, need, to, remove, punctuations], [Play, games]]

赞(0）回复(0）举报 2023-04-10

我来回答

删除pandas列中的标点符号，但保留原始列表结构

2条答案

相关问题

热门标签

最新问答