jieba.load_userdict()的问题

yzxexxkh  于 2023-02-04  发布在  其他
关注(0)|答案(4)|浏览(209)

我按照规范建立了自己的字典,因为业务需要,我创建了两个字典,分别名字为a.txt ,b.txt 这两个文件分表包括了两个数据表格里面的数据,自己测试了一下不能如果采用以下的方法加载两个文件,好像不能同时生效
import jieba
jieba.load_userdict('a.txt')
jieba.load_userdict('a.txt')

请问结巴能否同时加载两个用户字典呢??

yfwxisqw

yfwxisqw1#

今天又遇到了问题,我在我的flask web app 中使用了jieba的加载自定义字典功能,然后用下面的命令启动
gunicorn -w 4 -p gevent -b 0.0.0.0:9999 --reload run:app
发现jieba连续不断的吐出下面的提示,我觉得应该是gunicorn开启了多个线程导致了这个问题,我想请教下,该如何解决?

loading model from cache /tmp/jieba.cache
loading model cost 2.44023799896 seconds.
Trie has been built succesfully.
[2016-09-29 17:05:37 +0000] [32528] [INFO] Booting worker with pid: 32528
Building Trie..., from /root/py27/lib/python2.7/site-packages/jieba/dict.txt
loading model from cache /tmp/jieba.cache
loading model cost 2.28571200371 seconds.
Trie has been built succesfully.
[2016-09-29 17:06:06 +0000] [32556] [INFO] Booting worker with pid: 32556
Building Trie..., from /root/py27/lib/python2.7/site-packages/jieba/dict.txt
loading model from cache /tmp/jieba.cache
loading model cost 2.27150511742 seconds.
Trie has been built succesfully.
[2016-09-29 17:06:10 +0000] [32560] [INFO] Booting worker with pid: 32560
Building Trie..., from /root/py27/lib/python2.7/site-packages/jieba/dict.txt
loading model from cache /tmp/jieba.cache

w6lpcovy

w6lpcovy2#

gunicorn会fork多个进程,但是jieba是lazy加载词典的。你可以在import jieba后,调用一下jieba.initialize()。 这样就不会多次加载了。

u91tlkcl

u91tlkcl3#

同样也是jieba load_dict的问题,我发现我自己在词典中添加了一个词并设定了参数比如: 萌萌哒 50 a ,但是使用posseg分词的结果却是 萌萌哒 x ,这是版本问题还是其他设定的问题?

yuvru6vn

yuvru6vn4#

@fxsjy
具体的代码用到了这几个部分
import jieba
jieba.initialize()

import os
if os.path.exists('cbi360.txt'):
jieba.load_userdict('cbi360.txt')
import jieba.posseg as peg

其中 cbi360.txt是我的自己的字典,而且我还用刀了jieba.posseg 的方法,请问这个具体的顺序是怎么样的啊?

相关问题