jieba 分词编码问题

irlmq6kh  于 2022-11-19  发布在  其他
关注(0)|答案(2)|浏览(146)

File "D:/PythonCodes/fenciExp/jiebaExp/main.py", line 31, in start
for x in jieba.cut(i):
File "D:\��װ\Anaconda2\lib\site-packages\jieba__init__.py", line 301, in cut
for word in cut_block(blk):
File "D:\��װ\Anaconda2\lib\site-packages\jieba__init__.py", line 233, in cut_DAG
DAG = self.get_DAG(sentence)
File "D:\��װ\Anaconda2\lib\site-packages\jieba__init
.py", line 179, in get_DAG
self.check_initialized()
File "D:\��װ\Anaconda2\lib\site-packages\jieba__init__.py", line 168, in check_initialized
self.initialize()
File "D:\��װ\Anaconda2\lib\site-packages\jieba__init__.py", line 143, in initialize
self.FREQ, self.total = self.gen_pfdict(self.get_dict_file())
File "D:\��װ\Anaconda2\lib\site-packages\jieba__init__.py", line 352, in get_dict_file
return get_module_res(DEFAULT_DICT_NAME)
File "D:\��װ\Anaconda2\lib\site-packages\jieba_compat.py", line 8, in
os.path.join(*res))
File "D:\��װ\Anaconda2\lib\site-packages\setuptools-23.0.0-py2.7.egg\pkg_resources__init__.py", line 1178, in resource_stream

File "D:\��װ\Anaconda2\lib\site-packages\setuptools-23.0.0-py2.7.egg\pkg_resources__init__.py", line 1577, in get_resource_stream

File "D:\��װ\Anaconda2\lib\site-packages\setuptools-23.0.0-py2.7.egg\pkg_resources__init__.py", line 1530, in _fn

File "D:\��װ\Anaconda2\lib\ntpath.py", line 85, in join
result_path = result_path + p_path
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 1: ordinal not in range(

就是一个普通的测试:
···
import jieba

if name=="main":
str = '我爱北京天安门'
for i in jieba.cut(str):
print i
···

wsewodh2

wsewodh21#

改为
str = u'我爱北京天安门'

Python 处理 UTF-8 的字符串应该在前面加u。

hrysbysz

hrysbysz2#

不是这里的问题
读取字典时传入的字符串是 unicode;库提供的绝对路径是 str,该路径包含中文,所以最终 str + unicode 就出现了解码问题。方便的解决方案就是把你的 Anaconda 放在非中文目录。(这种 Python 2 问题代码里不好解决)

相关问题