我知道如何在单元格中为单个列表执行此操作,但我需要保持多个列表的结构,如[["I","need","to","remove","punctuations","."],[...],[...]]
-〉[["I","need","to","remove","punctuations"],[...],[...]]
我知道的所有方法都变成了这个-〉["I","need","to","remove","punctuations",...]
data["clean_text"] = data["clean_text"].apply(lambda x: [', '.join([c for c in s if c not in string.punctuation]) for s in x])
data["clean_text"] = data["clean_text"].str.replace(r'[^\w\s]+', '')
...
最好的办法是什么?
2条答案
按热度按时间klr1opcd1#
按照你的方法,我只需要添加一个带有helper函数的 listcomp:
输出:
carvr3hs2#
如果你的dataframe不是很大,你可以尝试
explode
list to rows,然后过滤掉包含标点符号的行,最后group
返回行。