NLTK的停用词需要首先通过NLTK数据安装程序下载,这是一次性设置，之后您将能够自由使用,从nltk.corpus导入stopwords,

7gyucuyw 于 7个月前发布在其他

关注(0)|答案(9)|浏览(64)

要下载停用词，请在您选择的终端中打开带有 python 的Python解释器，并输入：

之后，您就可以开始了！

最初由@tomaarsen在#3063(评论)中发布*

nltk

来源：https://github.com/nltk/nltk/issues/3107

9条答案

按热度按时间

hts6caw31#

尽管我遵循了nltk库的说明，但我仍然遇到了相同的问题。

赞(0）回复(0）举报 7个月前

jaxagkaj2#

你正在经历什么样的问题？在拨打nltk.download("stopwords")后发生了什么？你是否遇到了来自#3092的错误？如果是，请查看该问题，其中有一些建议已经帮助了其他人。

赞(0）回复(0）举报 7个月前

vaqhlq813#

import numpy as np
import pandas as pd
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity

data = pd.read_csv("/Users/atatekeli/PycharmProjects/NetflixRecm/netflixData.csv")
print(data.head())
print(data.info)
print(data.isnull().sum())

data = data[["Title", "Description", "Content Type", "Genres"]]
print(data.head())
data = data.dropna()

import nltk
import re
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('[.*?]', '', text)
    text = re.sub('https?://\S+|www.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('
', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text

data["Title"] = data["Title"].apply(clean)
print(data.Title.sample(10))

赞(0）回复(0）举报 7个月前

qyyhg6bp4#

这是一个错误信息，表示在加载停用词资源时出现了问题。请使用NLTK下载器获取该资源：

import nltk
nltk.download('stopwords')

更多信息请参考：https://www.nltk.org/data.html

赞(0）回复(0）举报 7个月前

gmxoilav5#

这似乎是问题所在：

[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data] unable to get local issuer certificate (_ssl.c:1129)>

我以前没有见过这个。也许这条评论(或其他线程中的评论)的帮助能解决你的问题：
https://stackoverflow.com/a/45018725/17936326 。我想，你可能在使用Python 3.6?那个帖子中肯定有一些有用的提示。

赞(0）回复(0）举报 7个月前

5lwkijsr6#

我的情况更糟，我下载了它，但我认为由于权限问题，PyCharm无法在主目录中找到该文件。通常来说，这是一个非常奇怪的决定。

赞(0）回复(0）举报 7个月前

mbjcgjjk7#

我在Mac M3的本地主机上遇到了相同的错误，代码如下：

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.downloader import Downloader
try:
    nltk.download('punkt')
except Exception as e:
    print(f"Error downloading punkt resource: {e}")
text = """This is an example text. It has multiple sentences.
Let's see how NLTK tokenizes them. NLTK also provides functionalities
for other types of text processing tasks."""
print("Original text:")
print(text)
try:
    sentences = nltk.sent_tokenize(text)
    print("
Sentences after tokenization:")
    for sentence in sentences:
        print(sentence)
except Exception as e:
    print(f"Error during sentence tokenization: {e}")

赞(0）回复(0）举报 7个月前

sdnqo3pr8#

我和我m2的Mac以及Python 12也有同样的问题。只有这个对我有用：https://stackoverflow.com/questions/41348621/ssl-error-downloading-nltk-data/42890688#42890688

赞(0）回复(0）举报 7个月前

j0pj023g9#

你好，@zmvmarina。

谢谢。我之前找到了那篇文章，但我没有时间去实施。但现在我已经实现了，它也对我有用。

非常感谢。

赞(0）回复(0）举报 7个月前