Is it possible to use the jieba
's tokenizer in TfidfVectorizer
? [http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html].
sklearn.feature_extraction.text.TfidfVectorizer
asks for a callable tokenizer, and I am wondering which function in jieba
can be passed here.
I would like to cluster or classify some Chinese documents.
2条答案
按热度按时间bkhjykvo1#
jieba.cut()
or
jieba.cut_for_search()
jieba_1_分词
lp0sw83n2#
tfidf_vectorizer = TfidfVectorizer(tokenizer=jieba.cut, lowercase=False, stop_words=stopwords)