pytorch ValueError:文本输入必须为“str”类型(单个示例)

nbysray5  于 2023-03-08  发布在  其他
关注(0)|答案(1)|浏览(866)

我尝试在零激发学习任务上运行MCLIP和ItalianCLIP的评估,发现此笔记本colab。当运行以下预测单元时,我得到以下错误

---------------------------------------------------------------------------

/opt/conda/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py in encode(self, sentences, batch_size, show_progress_bar, output_value, convert_to_numpy, convert_to_tensor, device, normalize_embeddings)
    159         for start_index in trange(0, len(sentences), batch_size, desc="Batches", disable=not show_progress_bar):
    160             sentences_batch = sentences_sorted[start_index:start_index+batch_size]
--> 161             features = self.tokenize(sentences_batch)
    162             features = batch_to_device(features, device)
    163 

/opt/conda/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py in tokenize(self, texts)
    317         Tokenizes the texts
    318         """
--> 319         return self._first_module().tokenize(texts)
    320 
    321     def get_sentence_features(self, *features):

/opt/conda/lib/python3.7/site-packages/sentence_transformers/models/CLIPModel.py in tokenize(self, texts)
     69             images = None
     70 
---> 71         inputs = self.processor(text=texts_values, images=images, return_tensors="pt", padding=True)
     72         inputs['image_text_info'] = image_text_info
     73         return inputs

/opt/conda/lib/python3.7/site-packages/transformers/models/clip/processing_clip.py in __call__(self, text, images, return_tensors, **kwargs)
     97 
     98         if text is not None:
---> 99             encoding = self.tokenizer(text, return_tensors=return_tensors, **kwargs)
    100 
    101         if images is not None:

/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py in __call__(self, text, text_pair, text_target, text_pair_target, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2521             if not self._in_target_context_manager:
   2522                 self._switch_to_input_mode()
-> 2523             encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
   2524         if text_target is not None:
   2525             self._switch_to_target_mode()

/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py in _call_one(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2580         if not _is_valid_text_input(text):
   2581             raise ValueError(
-> 2582                 "text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) "
   2583                 "or `List[List[str]]` (batch of pretokenized examples)."
   2584             )

ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).

如何解决这个问题?

yuvru6vn

yuvru6vn1#

我在使用finbert时遇到了同样的问题,但当我仔细阅读错误时,我明白了问题所在...错误本身告诉了它所期望的内容,即字符串,但由于找不到字符串或列表[list.而卡住了(str)]因此,如果您要在Excel列上应用该函数,只需添加.dropna()在最后,而阅读excel工作表,这将删除所有空单元格,并确保没有你的行在列中只有'整数',而不是字符串,这样会再次抛出错误,所以只是转换您的数字字符串,然后再继续。我花了一个多星期的时间才意识到这是一个多么小的错误。感觉像个白痴😅

相关问题