python 无法删除停止字

bq8i3lrv 于 2023-01-12 发布在 Python

关注(0)|答案(2)|浏览(114)

我有停止词列表，但不知何故程序无法删除语料库中的停止词
我使用的代码

stop_factory = StopWordRemoverFactory()
more_stopword = ['selamat','halo','hallo','hi']
dok_word = ['Dok','dok?', 'dok,', 'dok.', 'dok-', 'dok!', 'dok:', 'dok;', 'dok', 'dok.,','dok,.','dok?.',
            'Dokter','dokter?', 'dokter,', 'dokter.', 'dokter-', 'dokter!', 'dokter:', 'dokter;']
data = stop_factory.get_stop_words()+more_stopword+dok_word

# cleaning
def clean_text(text):
    new_text = []
    text = text.lower() # Lowercase
    # Loop each word in a sentence
    for kata in text.split(): 
        # Keep word not in slang or standard word
        if kata not in std_word_replace: 
            new_text.append(kata) 
        # Replace non-formal word with standard word
        elif kata in std_word_replace:
            new_text+=std_word_replace[kata].split() 
    # Join words without stopwords after stemming
    new_text = ' '.join(
        stemmer.stem(word) for word in new_text if word not in data
    )
    # Remove punctuations
    text = text.translate(str.maketrans('', '', string.punctuation))
    return new_text

所以我把这个代码xtrain['question'].apply(lambda x: clean_text(x))应用到我的语料库中，行就像this，从第一个索引开始取这个例子
字：'Dok,anak saya sudah imunisasi DPT'
输出：'dok anak imunisasi dpt'
“dok”这个字还在，我怎么解决这个问题？

python

来源：https://stackoverflow.com/questions/75082754/not-be-able-to-remove-stopword

2条答案

按热度按时间

xfb7svmp1#

在你的代码中，你创建了dok_word，但是你没有使用它。你还需要仔细检查，因为text_dok =“dok，anak saya sepertinya”如果你只是用空格分割，那么停止词仍然不会影响。

赞(0）回复(0）举报 2023-01-12

ilmyapht2#

最后一行有错误，应该像stopwordexample.remove(dok_word)那样使用

stop_factory = StopWordRemoverFactory()
dok_word = ['Dok','dok?', 'dok,', 'dok.', 'dok-', 'dok!', 'dok:', 'dok;', 'dok', 'dok.,','dok,.','dok?.',
            'Dokter','dokter?', 'dokter,', 'dokter.', 'dokter-', 'dokter!', 'dokter:', 'dokter;']

text_dok = "dok,anak saya sepertinya"
stopwordexample = stop_factory.create_stop_word_remover()
text_dok = stopwordexample.remove(dok_word)

赞(0）回复(0）举报 2023-01-12

我来回答

python 无法删除停止字

2条答案

相关问题

热门标签

最新问答