use jieba in sklearn.feature_extraction.text.TfidfVectorizer

lsmepo6l 于 2022-10-26 发布在其他

关注(0)|答案(2)|浏览(165)

Is it possible to use the jieba 's tokenizer in TfidfVectorizer ? [http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html].

sklearn.feature_extraction.text.TfidfVectorizer asks for a callable tokenizer, and I am wondering which function in jieba can be passed here.

I would like to cluster or classify some Chinese documents.

jieba

来源：https://github.com/fxsjy/jieba/issues/332

2条答案

按热度按时间

bkhjykvo1#

jieba.cut()
or
jieba.cut_for_search()

jieba_1_分词

赞(0）回复(0）举报 2022-10-26

lp0sw83n2#

tfidf_vectorizer = TfidfVectorizer(tokenizer=jieba.cut, lowercase=False, stop_words=stopwords)

赞(0）回复(0）举报 2022-10-26

我来回答

use jieba in sklearn.feature_extraction.text.TfidfVectorizer

2条答案

相关问题

热门标签

最新问答