我有一个包含否定词的推文列表,比如"不,从不,很少"
我想将"not nice"转换为"not_nice"(用下划线分隔)。
我怎样才能把推文中所有的"不"字和它们后面的话连接起来呢?
我试着这样做,但它没有改变任何东西,句子保持不变,没有改变
def combine(negation_words, word_scan):
if type(negation_words) != list:
negation_words = [negation_words]
n_index = []
for i in negation_words:
index_replace = [(m.end(0)) for m in re.finditer(i,word_scan)]
n_index += index_replace
for rep in n_index:
letters = [x for x in word_scan]
letters[rep] = "_"
word_scan = "".join(letters)
return word_scan
negation_words = ["no", "not"]
word_scan = df
combine(negation_words, word_scan)
df['clean'] = df['tweets'].apply(lambda x: combine(str(x), word_scan))
df
1条答案
按热度按时间oalqel3c1#
您可以使用
re.sub
或Series.str.replace
与regex来查找negation_words
列表中后跟空格的任何单词,并将其替换为下划线。现在,用
case=False
调用Series.str.replace
,以进行不区分大小写的匹配:其给出: