python 将HuggingFace Tokenizer转换为TensorFlow Keras层

1hdlvixo 于 2024-01-05 发布在 Python

关注(0)|答案(1)|浏览(177)

我很难理解如何使用加载为TensorFlow Keras模型的预先训练的HuggingFace模型执行推理。

上下文

在我的例子中，我尝试微调一个预训练的DistilBert分类器。我有如下的东西来预处理我的数据和加载/训练我的模型：

from transformers import TFAutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = TFAutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
)
# add another layer
tf_train = model.prepare_tf_dataset(question_train_test_split['train'], batch_size=16, shuffle=True, tokenizer=tokenizer)
model.compile(optimizer=tf.keras.optimizers.Adam(2e-5))
# freeze the first transformer layer of model
model.layers[0].trainable=False
print('Model Architecture:')
print(model.summary())
model.fit(tf_train, epochs=3)

字符串
其中，question_train_test_split是一个拥抱脸Dataset对象示例。
这段代码和预期的一样完美，它将HuggingFace模型加载为tf.keras层，甚至可以使用.fit方法进行正确的训练。
但是，当我想执行预测时，我遇到了问题。我知道我需要对字符串输入进行标记化，但是，我想将标记化器加载为tf.keras层。我到处寻找这样做的方法，但没有找到任何方法。
理想情况下，我会喜欢这样的东西：

user_input = 'When were the Beatles formed?'
model_input = tokenizer(user_input) # THIS HF TOKENIZER SHOULD BE A tf.keras LAYER
model = model(model_input)

型
这样我就可以将整个模型（包括tokenizer和Transformer层+classifier层）保存到TensorFlow SavedModel中。如果有任何指针可以将HuggingFace tokenizer转换为TensorFlow Keras层，我将非常感谢您的指针。

python

来源：https://stackoverflow.com/questions/77617031/converting-huggingface-tokenizer-to-tensorflow-keras-layer

1条答案

按热度按时间

klsxnrf11#

在hugging face文档中，您可以使用tokenizer对文本进行tokenizer并返回tensorflowTensor。您可以执行以下操作：

tokenized_outputs = tokenizer(user_input, return_tensors = 'tf')

字符串
这表明您希望返回tensorflow类型的Tensor。您可以通过键入'pt'而不是'tf'来选择PyTorchTensor
有关更多信息，请查看此处的文档。
你也可以在transformers中使用函数TFAutoModel来检索tensorflow格式的模型。
希望这有帮助

赞(0）回复(0）举报 2024-01-05

我来回答

python 将HuggingFace Tokenizer转换为TensorFlow Keras层

1条答案

相关问题

热门标签

最新问答