我尝试在零激发学习任务上运行MCLIP和ItalianCLIP的评估,发现此笔记本colab。当运行以下预测单元时,我得到以下错误
---------------------------------------------------------------------------
/opt/conda/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py in encode(self, sentences, batch_size, show_progress_bar, output_value, convert_to_numpy, convert_to_tensor, device, normalize_embeddings)
159 for start_index in trange(0, len(sentences), batch_size, desc="Batches", disable=not show_progress_bar):
160 sentences_batch = sentences_sorted[start_index:start_index+batch_size]
--> 161 features = self.tokenize(sentences_batch)
162 features = batch_to_device(features, device)
163
/opt/conda/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py in tokenize(self, texts)
317 Tokenizes the texts
318 """
--> 319 return self._first_module().tokenize(texts)
320
321 def get_sentence_features(self, *features):
/opt/conda/lib/python3.7/site-packages/sentence_transformers/models/CLIPModel.py in tokenize(self, texts)
69 images = None
70
---> 71 inputs = self.processor(text=texts_values, images=images, return_tensors="pt", padding=True)
72 inputs['image_text_info'] = image_text_info
73 return inputs
/opt/conda/lib/python3.7/site-packages/transformers/models/clip/processing_clip.py in __call__(self, text, images, return_tensors, **kwargs)
97
98 if text is not None:
---> 99 encoding = self.tokenizer(text, return_tensors=return_tensors, **kwargs)
100
101 if images is not None:
/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py in __call__(self, text, text_pair, text_target, text_pair_target, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
2521 if not self._in_target_context_manager:
2522 self._switch_to_input_mode()
-> 2523 encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
2524 if text_target is not None:
2525 self._switch_to_target_mode()
/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py in _call_one(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
2580 if not _is_valid_text_input(text):
2581 raise ValueError(
-> 2582 "text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) "
2583 "or `List[List[str]]` (batch of pretokenized examples)."
2584 )
ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).
如何解决这个问题?
1条答案
按热度按时间yuvru6vn1#
我在使用finbert时遇到了同样的问题,但当我仔细阅读错误时,我明白了问题所在...错误本身告诉了它所期望的内容,即字符串,但由于找不到字符串或列表[list.而卡住了(str)]因此,如果您要在Excel列上应用该函数,只需添加.dropna()在最后,而阅读excel工作表,这将删除所有空单元格,并确保没有你的行在列中只有'整数',而不是字符串,这样会再次抛出错误,所以只是转换您的数字字符串,然后再继续。我花了一个多星期的时间才意识到这是一个多么小的错误。感觉像个白痴😅