use jieba in sklearn.feature_extraction.text.TfidfVectorizer

lsmepo6l  于 2022-10-26  发布在  其他
关注(0)|答案(2)|浏览(142)

Is it possible to use the jieba 's tokenizer in TfidfVectorizer ? [http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html].

sklearn.feature_extraction.text.TfidfVectorizer asks for a callable tokenizer, and I am wondering which function in jieba can be passed here.

I would like to cluster or classify some Chinese documents.

bkhjykvo

bkhjykvo1#

jieba.cut()
or
jieba.cut_for_search()

jieba_1_分词

lp0sw83n

lp0sw83n2#

tfidf_vectorizer = TfidfVectorizer(tokenizer=jieba.cut, lowercase=False, stop_words=stopwords)

相关问题