(FinGPT) developer@ai:/PROJECTS/FinGPT/fingpt/FinGPT_sentiment/instruct-FinGPT$ python ./inference/batchbot_torch.py --path /opt/data/data/THUDM/chatglm2-6b --max_new_tokens 16
Traceback (most recent call last):
File "/home/developer/PROJECTS/FinGPT/fingpt/FinGPT_sentiment/instruct-FinGPT/./inference/batchbot_torch.py", line 147, in main(args)
File "/home/developer/PROJECTS/FinGPT/fingpt/FinGPT_sentiment/instruct-FinGPT/./inference/batchbot_torch.py", line 93, in main
generator = get_generator(args.path)
File "/home/developer/PROJECTS/FinGPT/fingpt/FinGPT_sentiment/instruct-FinGPT/./inference/batchbot_torch.py", line 53, in get_generator
tokenizer = AutoTokenizer.from_pretrained(path, fast_tokenizer=True)
File "/home/developer/mambaforge/envs/FinGPT/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 748, in from_pretrained
raise ValueError(
ValueError: Tokenizer class ChatGLMTokenizer does not exist or is not currently imported.
(FinGPT) developer@ai:/PROJECTS/FinGPT/fingpt/FinGPT_sentiment/instruct-FinGPT$
3条答案
按热度按时间ecfdbz9o1#
经过多次尝试,现在它是这样:
v8wbuo2f2#
运行你的 Jupyter 笔记本
文件:~/.cache/huggingface/modules/transformers_modules/THUDM/chatglm2-6b/8fd7fba285f7171d3ae7ea3b35c53b6340501ed1/tokenization_chatglm.py:69,在 ChatGLMTokenizer.init(self, vocab_file, padding_side, clean_up_tokenization_spaces, **kwargs)
68 def init(self, vocab_file, padding_side="left", clean_up_tokenization_spaces=False, **kwargs):
---> 69 super().init(padding_side=padding_side, clean_up_tokenization_spaces=clean_up_tokenization_spaces, **kwargs)
70 self.name = "GLMTokenizer"
72 self.vocab_file = vocab_file
文件:~/mambaforge/envs/FinGPT/lib/python3.10/site-packages/transformers/tokenization_utils.py:366,在 PreTrainedTokenizer.init(self, **kwargs)
362 self._added_tokens_encoder: Dict[str, int] = {k.content: v for v, k in self._added_tokens_decoder.items()}
364 # 4. 如果某些特殊标记不是词汇表的一部分,我们将它们添加到最后。
365 # 添加顺序与 self.SPECIAL_TOKENS_ATTRIBUTES 相同,如下所示
tokenizers
--> 366 self._add_tokens(self.all_special_tokens_extended, special_tokens=True)
368 self._decode_use_source_tokenizer = False
文件:~/mambaforge/envs/FinGPT/lib/python3.10/site-packages/transformers/tokenization_utils.py:454,在 PreTrainedTokenizer._add_tokens(self, new_tokens, special_tokens)
452 if new_tokens is None:
453 return added_tokens
--> 454 current_vocab = self.get_vocab().copy()
455 new_idx = len(current_vocab) # 只调用一次,len 给出最后一个索引 + 1
456 for token in new_tokens:
文件:~/.cache/huggingface/modules/transformers_modules/THUDM/chatglm2-6b/8fd7fba285f7171d3ae7ea3b35c53b6340501ed1/tokenization_chatglm.py:112,在 ChatGLMTokenizer.get_vocab(self)
110 def get_vocab(self):
111 """ Returns vocab as a dict """
--> 112 vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
113 vocab.update(self.added_tokens_encoder)
114 return vocab
文件:~/.cache/huggingface/modules/transformers_modules/THUDM/chatglm2-6b/8fd7fba285f7171d3ae7ea3b35c53b6340501ed1/tokenization_chatglm.py:108,在 ChatGLMTokenizer.vocab_size(self)
106 @Property
107 def vocab_size(self):
--> 108 return self.tokenizer.n_words
AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'
ibps3vxo3#
!pip install protobuf transformers==4.30.2 cpm_kernels torch>=2.0 gradio mdtex2html sentencepiece accelerate