pytorch 使用自定义X和Y数据训练TFBertForSequenceClassification

jtoj6r0c 于 2023-08-05 发布在其他

关注(0)|答案(4)|浏览(103)

我正在处理一个TextClassification问题，为此我试图在huggingface-transformers库中给出的TFBertForSequenceClassification上训练我的模型。
我遵循了他们的github页面上给出的示例，我能够使用tensorflow_datasets.load('glue/mrpc')运行给定样本数据的示例代码。但是，我无法找到如何加载自己的自定义数据并将其传递到model.fit(train_dataset, epochs=2, steps_per_epoch=115, validation_data=valid_dataset, validation_steps=7)中的示例。
我如何定义我自己的X，对我的X进行标记化，并用我的X和Y准备train_dataset。其中X表示我的输入文本，Y表示给定X的分类类别。
样本训练 Dataframe ：

text    category_index
0   Assorted Print Joggers - Pack of 2 ,/ Gray Pri...   0
1   "Buckle" ( Matt ) for 35 mm Width Belt  0
2   (Gagam 07) Barcelona Football Jersey Home 17 1...   2
3   (Pack of 3 Pair) Flocklined Reusable Rubber Ha...   1
4   (Summer special Offer)Firststep new born baby ...   0

字符串

pytorch

来源：https://stackoverflow.com/questions/60463829/training-tfbertforsequenceclassification-with-custom-x-and-y-data

4条答案

按热度按时间

91zkwejq1#

微调方式

有多种方法可以为目标任务微调BERT。
1.进一步预训练基础BERT模型
1.在可训练的基础BERT模型之上的自定义分类层
1.基础BERT模型之上的自定义分类层不可训练（冻结）
请注意，BERT基础模型仅针对两个任务进行了预训练，如原始论文中所述。

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

3.1预训练BERT...我们使用两个无监督任务预训练BERT

任务#1：Masked LM
任务#2：下一句预测（NSP）

因此，基本BERT模型就像是半成品，可以完全用于目标域（第一种方式）。我们可以使用它作为我们自定义模型训练的一部分，使用基础可训练（第二）或不可训练（第三）。

第一种方法

How to Fine-Tune BERT for Text Classification?展示了进一步预训练的第一种方法，并指出学习率是避免灾难性遗忘的关键，在这种情况下，预训练的知识在学习新知识时被删除。
我们发现，一个较低的学习率，如2 e-5，是必要的，使BERT克服灾难性的遗忘问题。在4 e-4的积极学习率下，训练集无法收敛。
x1c 0d1x的数据
也许这就是为什么BERT paper使用5e-5，4 e-5，3e-5和2 e-5进行微调的原因。
我们使用32的批量大小，并对所有GLUE任务的数据进行3个时期的微调。对于每个任务，我们在Dev集上选择最佳微调学习率（在5e-5、4 e-5、3e-5和2 e-5中
注意，基础模型预训练本身使用更高的学习率。

bert-base-uncased - pretraining

该模型在Pod配置中的4个云TPU（总共16个TPU芯片）上训练一百万步，批量大小为256。对于90%的步骤，序列长度被限制为128个令牌，对于剩余的10%，序列长度被限制为512个令牌。使用的优化器是Adam，学习率为1e-4，β1= 0.9和β2= 0.999，权重衰减为0.01，学习率预热10，000步，之后学习率线性衰减。
将描述第一种方法作为下面第三种方法的一部分。
仅供参考：TFDistilBertModel是名为distilbert的裸基础模型。

Model: "tf_distil_bert_model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
distilbert (TFDistilBertMain multiple                  66362880  
=================================================================
Total params: 66,362,880
Trainable params: 66,362,880
Non-trainable params: 0

字符串

第二种方法

Huggingface采用了第二种方法，如使用原生PyTorch/TensorFlow进行微调，其中TFDistilBertForSequenceClassification在可训练的基础distilbert模型之上添加了自定义分类层classifier。小的学习率要求也将适用于避免灾难性的遗忘。

from transformers import TFDistilBertForSequenceClassification

model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)

Model: "tf_distil_bert_for_sequence_classification_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
distilbert (TFDistilBertMain multiple                  66362880  
_________________________________________________________________
pre_classifier (Dense)       multiple                  590592    
_________________________________________________________________
classifier (Dense)           multiple                  1538      
_________________________________________________________________
dropout_59 (Dropout)         multiple                  0         
=================================================================
Total params: 66,955,010
Trainable params: 66,955,010  <--- All parameters are trainable
Non-trainable params: 0

的数据

第二种方式的实现

import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from transformers import (
    DistilBertTokenizerFast,
    TFDistilBertForSequenceClassification,
)

DATA_COLUMN = 'text'
LABEL_COLUMN = 'category_index'
MAX_SEQUENCE_LENGTH = 512
LEARNING_RATE = 5e-5
BATCH_SIZE = 16
NUM_EPOCHS = 3

# --------------------------------------------------------------------------------
# Tokenizer
# --------------------------------------------------------------------------------
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
def tokenize(sentences, max_length=MAX_SEQUENCE_LENGTH, padding='max_length'):
    """Tokenize using the Huggingface tokenizer
    Args:
        sentences: String or list of string to tokenize
        padding: Padding method ['do_not_pad'|'longest'|'max_length']
    """
    return tokenizer(
        sentences,
        truncation=True,
        padding=padding,
        max_length=max_length,
        return_tensors="tf"
    )

# --------------------------------------------------------------------------------
# Load data
# --------------------------------------------------------------------------------
raw_train = pd.read_csv("./train.csv")
train_data, validation_data, train_label, validation_label = train_test_split(
    raw_train[DATA_COLUMN].tolist(),
    raw_train[LABEL_COLUMN].tolist(),
    test_size=.2,
    shuffle=True
)

# --------------------------------------------------------------------------------
# Prepare TF dataset
# --------------------------------------------------------------------------------
train_dataset = tf.data.Dataset.from_tensor_slices((
    dict(tokenize(train_data)),  # Convert BatchEncoding instance to dictionary
    train_label
)).shuffle(1000).batch(BATCH_SIZE).prefetch(1)
validation_dataset = tf.data.Dataset.from_tensor_slices((
    dict(tokenize(validation_data)),
    validation_label
)).batch(BATCH_SIZE).prefetch(1)

# --------------------------------------------------------------------------------
# training
# --------------------------------------------------------------------------------
model = TFDistilBertForSequenceClassification.from_pretrained(
    'distilbert-base-uncased',
    num_labels=NUM_LABELS
)
optimizer = tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE)
model.compile(
    optimizer=optimizer,
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
)
model.fit(
    x=train_dataset,
    y=None,
    validation_data=validation_dataset,
    batch_size=BATCH_SIZE,
    epochs=NUM_EPOCHS,
)

型

第三种方法

基础知识

请注意，这些图像是从A Visual Guide to Using BERT for the First Time拍摄并修改的。

Tokenizer

Tokenizer生成BatchEncoding的示例，它可以像Python字典一样使用，也可以作为BERT模型的输入。

批量编码

保存encode_plus（）和batch_encode（）方法的输出（tokens、attention_masks等）。
这个类是从python字典派生的，可以用作字典。此外，此类公开实用程序方法以从单词/字符空间Map到标记空间。
主要参数

data（dict）-由encode/batch_encode方法（'input_ids'，'attention_mask'，etc.）返回的列表/数组/Tensor的字典。

类的data属性是生成的令牌，其中包含input_ids和attention_mask元素。

input_ids

input_ids

输入id通常是作为输入传递给模型的唯一所需参数。它们是标记索引，标记的数字表示构建将用作模型输入的序列。

attention_mask

注意面罩

该参数向模型指示哪些令牌应该被关注，哪些不应该。
如果attention_mask为0，则忽略令牌id。例如，如果序列被填充以调整序列长度，则填充的字应当被忽略，因此它们的attention_mask为0。

特殊令牌

BertTokenizer添加特殊令牌，包含一个序列[CLS]和[SEP]。[CLS]表示分类，[SEP]分隔序列。对于问答或释义任务，[SEP]将两个句子分开进行比较。
BertTokenizer

cls_token（str，可选，默认为“[CLS]”）

进行序列分类时使用的分类器令牌（对整个序列进行分类，而不是按令牌分类）。当使用特殊标记构建时，它是序列的第一个标记。

sep_token（str，可选，默认为“[SEP]”）

分隔符标记，当从多个序列构建序列时使用，例如用于序列分类的两个序列或用于文本和用于问答的问题。它也用作用特殊标记构建的序列的最后一个标记。
A Visual Guide to Using BERT for the First Time显示标记化。

的

[CLS]

在来自基础模型最终层的输出中的**[CLS]的嵌入向量表示基础模型已经学习的分类。因此，将[CLS]**标记的嵌入向量馈送到添加在基础模型之上的分类层中。

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

每个序列的第一个标记始终是a special classification token ([CLS])。对应于此标记的最终隐藏状态用作分类任务的聚合序列表示。句子对被打包成单个序列。我们用两种方法来区分句子。首先，我们用一个特殊的标记（[SEP]）将它们分开。其次，我们将学习嵌入添加到每个标记，以指示它属于句子A还是句子B。
该模型结构如下所示。

的

向量大小

在模型distilbert-base-uncased中，每个标记被嵌入到大小为768的向量中。基础模型输出的形状为(batch_size, max_sequence_length, embedding_vector_size=768)。这与BERT关于BERT/BASE模型的论文一致（如distilbert-base-uncased中所示）。

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding字符串

BERT/底座（L=12，H=768，A=12，总参数= 110 M）和BERT/大型（L=24，H=1024，A=16，总参数= 340 M）。

基本模型- TFDistilBertModel

Hugging Face Transformers: Fine-tuning DistilBERT for Binary Classification Tasks

用于示例化基DistilBERT模型的TFDistilBertModel类**，但顶部没有任何特定的头**（与其他类（如TFDistilBertForSequenceClassification）相反，这些类具有添加的分类头）。
我们不希望附加任何特定于任务的标题，因为我们只是希望基本模型的预训练权重提供对英语的一般理解，并且我们的工作是在微调过程中添加我们自己的分类标题，以便帮助模型区分有害评论。
TFDistilBertModel生成TFBaseModelOutput的示例，其last_hidden_state参数是模型最后一层的输出。

TFBaseModelOutput([(
    'last_hidden_state',
    <tf.Tensor: shape=(batch_size, sequence_lendgth, 768), dtype=float32, numpy=array([[[...]]], dtype=float32)>
)])

型

TFBaseModelOutput基本模型输出

主要参数

last_hidden_state（tf.Tensor of shape（batch_size，sequence_length，hidden_size））-模型最后一层输出处的隐藏状态序列。

实施

Python模块

import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from transformers import (
    DistilBertTokenizerFast,
    TFDistilBertModel,
)

型

配置

TIMESTAMP = datetime.datetime.now().strftime("%Y%b%d%H%M").upper()

DATA_COLUMN = 'text'
LABEL_COLUMN = 'category_index'

MAX_SEQUENCE_LENGTH = 512   # Max length allowed for BERT is 512.
NUM_LABELS = len(raw_train[LABEL_COLUMN].unique())

MODEL_NAME = 'distilbert-base-uncased'
NUM_BASE_MODEL_OUTPUT = 768

# Flag to freeze base model
FREEZE_BASE = True

# Flag to add custom classification heads
USE_CUSTOM_HEAD = True
if USE_CUSTOM_HEAD == False:
    # Make the base trainable when no classification head exists.
    FREEZE_BASE = False

BATCH_SIZE = 16
LEARNING_RATE = 1e-2 if FREEZE_BASE else 5e-5
L2 = 0.01

型

令牌化程序

tokenizer = DistilBertTokenizerFast.from_pretrained(MODEL_NAME)
def tokenize(sentences, max_length=MAX_SEQUENCE_LENGTH, padding='max_length'):
    """Tokenize using the Huggingface tokenizer
    Args:
        sentences: String or list of string to tokenize
        padding: Padding method ['do_not_pad'|'longest'|'max_length']
    """
    return tokenizer(
        sentences,
        truncation=True,
        padding=padding,
        max_length=max_length,
        return_tensors="tf"
    )

型

输入图层

基本模型需要input_ids和attention_mask，它们的形状为(max_sequence_length,)。分别用Input层为它们生成KerasTensor。

# Inputs for token indices and attention masks
input_ids = tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype=tf.int32, name='input_ids')
attention_mask = tf.keras.layers.Input((MAX_SEQUENCE_LENGTH,), dtype=tf.int32, name='attention_mask')

型

基础模型层

从基础模型生成输出。基本模型生成TFBaseModelOutput。将**[CLS]**的嵌入进给到下一层。

base = TFDistilBertModel.from_pretrained(
    MODEL_NAME,
    num_labels=NUM_LABELS
)

# Freeze the base model weights.
if FREEZE_BASE:
    for layer in base.layers:
        layer.trainable = False
    base.summary()

# [CLS] embedding is last_hidden_state[:, 0, :]
output = base([input_ids, attention_mask]).last_hidden_state[:, 0, :]

型

分类图层

if USE_CUSTOM_HEAD:
    # -------------------------------------------------------------------------------
    # Classifiation leayer 01
    # --------------------------------------------------------------------------------
    output = tf.keras.layers.Dropout(
        rate=0.15,
        name="01_dropout",
    )(output)
    
    output = tf.keras.layers.Dense(
        units=NUM_BASE_MODEL_OUTPUT,
        kernel_initializer='glorot_uniform',
        activation=None,
        name="01_dense_relu_no_regularizer",
    )(output)
    output = tf.keras.layers.BatchNormalization(
        name="01_bn"
    )(output)
    output = tf.keras.layers.Activation(
        "relu",
        name="01_relu"
    )(output)

    # --------------------------------------------------------------------------------
    # Classifiation leayer 02
    # --------------------------------------------------------------------------------
    output = tf.keras.layers.Dense(
        units=NUM_BASE_MODEL_OUTPUT,
        kernel_initializer='glorot_uniform',
        activation=None,
        name="02_dense_relu_no_regularizer",
    )(output)
    output = tf.keras.layers.BatchNormalization(
        name="02_bn"
    )(output)
    output = tf.keras.layers.Activation(
        "relu",
        name="02_relu"
    )(output)

型

Softmax图层

output = tf.keras.layers.Dense(
    units=NUM_LABELS,
    kernel_initializer='glorot_uniform',
    kernel_regularizer=tf.keras.regularizers.l2(l2=L2),
    activation='softmax',
    name="softmax"
)(output)

型

最终定制模型

name = f"{TIMESTAMP}_{MODEL_NAME.upper()}"
model = tf.keras.models.Model(inputs=[input_ids, attention_mask], outputs=output, name=name)
model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
    optimizer=tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE),
    metrics=['accuracy']
)
model.summary()
---
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_ids (InputLayer)          [(None, 256)]        0                                            
__________________________________________________________________________________________________
attention_mask (InputLayer)     [(None, 256)]        0                                            
__________________________________________________________________________________________________
tf_distil_bert_model (TFDistilB TFBaseModelOutput(la 66362880    input_ids[0][0]                  
                                                                 attention_mask[0][0]             
__________________________________________________________________________________________________
tf.__operators__.getitem_1 (Sli (None, 768)          0           tf_distil_bert_model[1][0]       
__________________________________________________________________________________________________
01_dropout (Dropout)            (None, 768)          0           tf.__operators__.getitem_1[0][0] 
__________________________________________________________________________________________________
01_dense_relu_no_regularizer (D (None, 768)          590592      01_dropout[0][0]                 
__________________________________________________________________________________________________
01_bn (BatchNormalization)      (None, 768)          3072        01_dense_relu_no_regularizer[0][0
__________________________________________________________________________________________________
01_relu (Activation)            (None, 768)          0           01_bn[0][0]                      
__________________________________________________________________________________________________
02_dense_relu_no_regularizer (D (None, 768)          590592      01_relu[0][0]                    
__________________________________________________________________________________________________
02_bn (BatchNormalization)      (None, 768)          3072        02_dense_relu_no_regularizer[0][0
__________________________________________________________________________________________________
02_relu (Activation)            (None, 768)          0           02_bn[0][0]                      
__________________________________________________________________________________________________
softmax (Dense)                 (None, 2)            1538        02_relu[0][0]                    
==================================================================================================
Total params: 67,551,746
Trainable params: 1,185,794
Non-trainable params: 66,365,952   <--- Base BERT model is frozen

型

数据分配

# --------------------------------------------------------------------------------
# Split data into training and validation
# --------------------------------------------------------------------------------
raw_train = pd.read_csv("./train.csv")
train_data, validation_data, train_label, validation_label = train_test_split(
    raw_train[DATA_COLUMN].tolist(),
    raw_train[LABEL_COLUMN].tolist(),
    test_size=.2,
    shuffle=True
)

# X = dict(tokenize(train_data))
# Y = tf.convert_to_tensor(train_label)
X = tf.data.Dataset.from_tensor_slices((
    dict(tokenize(train_data)),  # Convert BatchEncoding instance to dictionary
    train_label
)).batch(BATCH_SIZE).prefetch(1)

V = tf.data.Dataset.from_tensor_slices((
    dict(tokenize(validation_data)),  # Convert BatchEncoding instance to dictionary
    validation_label
)).batch(BATCH_SIZE).prefetch(1)

型
火车

# --------------------------------------------------------------------------------
# Train the model
# https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit
# Input data x can be a dict mapping input names to the corresponding array/tensors, 
# if the model has named inputs. Beware of the "names". y should be consistent with x 
# (you cannot have Numpy inputs and tensor targets, or inversely). 
# --------------------------------------------------------------------------------
history = model.fit(
    x=X,    # dictionary 
    # y=Y,
    y=None,
    epochs=NUM_EPOCHS,
    batch_size=BATCH_SIZE,
    validation_data=V,
)

型
要实施第一种方法，请按如下所示更改配置。

USE_CUSTOM_HEAD = False

型
然后，将FREEZE_BASE更改为False，并将LEARNING_RATE更改为5e-5，这将在基础BERT模型上运行进一步预训练。

保存模型

对于第三种方法，保存模型将导致问题。无法使用拥抱面部模型的保存_pretrained方法，因为该模型不是拥抱面部预训练模型的直接子类。
Keras save_model会导致预设save_traces=True发生错误，或在载入具有Keras load_model的模型时，导致save_traces=True发生不同的错误。

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-71-01d66991d115> in <module>()
----> 1 tf.keras.models.load_model(MODEL_DIRECTORY)
 
11 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/saving/saved_model/load.py in _unable_to_call_layer_due_to_serialization_issue(layer, *unused_args, **unused_kwargs)
    865       'recorded when the object is called, and used when saving. To manually '
    866       'specify the input shape/dtype, decorate the call function with '
--> 867       '`@tf.function(input_signature=...)`.'.format(layer.name, type(layer)))
    868 
    869 
 
ValueError: Cannot call custom layer tf_distil_bert_model of type <class 'tensorflow.python.keras.saving.saved_model.load.TFDistilBertModel'>, because the call function was not serialized to the SavedModel.Please try one of the following methods to fix this issue:
 
(1) Implement `get_config` and `from_config` in the layer/model class, and pass the object to the `custom_objects` argument when loading the model. For more details, see: https://www.tensorflow.org/guide/keras/save_and_serialize
 
(2) Ensure that the subclassed model or layer overwrites `call` and not `__call__`. The input shape and dtype will be automatically recorded when the object is called, and used when saving. To manually specify the input shape/dtype, decorate the call function with `@tf.function(input_signature=...)`.

型
就我测试的情况而言，只有Keras Model保存_weights工作正常。

实验

就我使用Toxic Comment Classification Challenge进行的测试而言，第一种方法的召回率更高（识别真正的有毒注解，真正的无毒注解）。可以按如下方式访问代码。如有任何更正/建议，请提供。

Code for 1st and 3rd approach的

配置文件

在安装模型时，需要定义在Transformers配置文件中定义的模型初始化参数。基类为PretrainedConfig。

预培训配置

所有配置类的基类。处理所有型号配置通用的一些参数以及加载/下载/保存配置的方法。
每个子类都有其自己的参数，例如，Bert预训练模型具有BertConfig。

BertConfig配置

这是用于存储BertModel或TFBertModel配置的配置类。它用于根据指定的参数示例化BERT模型，定义模型架构。使用默认值示例化配置将产生与BERT bert-base-uncased体系结构类似的配置。
例如，num_labels参数来自预训练配置
num_labels（int，可选）-要在添加到模型得最后一层中使用得标签数，通常用于分类责任.

TFBertForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)

字符串
bert-base-uncased型号的配置文件发布在Huggingface model - bert-base-uncased - config.json上。

{
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.6.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

型

微调（迁移学习）

Huggngface提供了几个示例，用于对您自己的自定义数据集进行微调。例如，利用BERT的序列分类能力进行文本分类。

Fine-tuning with custom datasets

本教程将带您通过几个例子，使用Transformers模型与您自己的数据集。

Fine-tuning a pretrained model的

如何从变形金刚库中微调预训练的模型。在TensorFlow中，可以使用Keras和拟合方法直接训练模型。
但是，文档中的示例只是概述，缺乏详细信息。
使用原生PyTorch/TensorFlow进行微调

from transformers import TFDistilBertForSequenceClassification

model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')

optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)

型
github提供了完整的代码。

HuggingFace Text classification examples的

此文件夹包含一些脚本，这些脚本显示了使用拥抱变形金刚库进行文本分类的示例。
run_text_classification.py是针对TensorFlow进行文本分类微调的示例。
然而，这并不简单，也不直接，因为它是通用的和通用的用法。因此，没有一个很好的例子，让人们开始，造成的情况下，人们需要提出这样的问题。

分类图层

您会看到迁移学习（微调）文章解释了如何在预先训练的基础模型上添加分类层，答案中也是如此。

output = tf.keras.layers.Dense(num_labels, activation='softmax')(output)

型
但是，文档中的huggingface示例没有添加任何分类层。

from transformers import TFDistilBertForSequenceClassification

model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')

optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)

型
这是因为TFBertForSequenceClassification已经添加了这些层。

Hugging Face Transformers: Fine-tuning DistilBERT for Binary Classification Tasks的

基本DistilBERT模型顶部没有任何特定的头（与其他类相反，例如TFDistilBertForSequenceClassification，它具有添加的分类头）。
如果显示Keras模型摘要（例如TFDistilBertForSequenceClassification），则会显示添加到基础BERT模型顶部的Dense和Dropout层。

Model: "tf_distil_bert_for_sequence_classification_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
distilbert (TFDistilBertMain multiple                  66362880  
_________________________________________________________________
pre_classifier (Dense)       multiple                  590592    
_________________________________________________________________
classifier (Dense)           multiple                  1538      
_________________________________________________________________
dropout_59 (Dropout)         multiple                  0         
=================================================================
Total params: 66,955,010
Trainable params: 66,955,010
Non-trainable params: 0

型

冻结基础模型参数

有一些讨论，例如Fine Tune BERT Models，但很明显，Huggingface的方法不是冻结基本模型参数。如图所示，Keras模型总结abobe Non-trainable params: 0。
冻结基础distilbert图层。

for _layer in model:
    if _layer.name == 'distilbert':
        print(f"Freezing model layer {_layer.name}")
        _layer.trainable = False
    print(_layer.name)
    print(_layer.trainable)
---
Freezing model layer distilbert
distilbert
False      <----------------
pre_classifier
True
classifier
True
dropout_99
True

型

资源

Kaggle是其他需要研究的资源。用关键字“huggingface”“BERT”搜索，你会找到为比赛发布的工作代码。

赞(0）回复(0）举报 2023-08-05

我来回答

pytorch 使用自定义X和Y数据训练TFBertForSequenceClassification

4条答案

微调方式

第一种方法

第二种方法

第二种方式的实现

第三种方法

基础知识

Tokenizer

input_ids

attention_mask

特殊令牌

[CLS]

向量大小

基本模型- TFDistilBertModel

实施

Python模块

配置

令牌化程序

输入图层

基础模型层

分类图层

Softmax图层

最终定制模型

数据分配

保存模型

实验

相关

配置文件

微调（迁移学习）

分类图层

冻结基础模型参数

资源

相关问题

热门标签

最新问答