pytorch 拥抱脸默认值是否允许记录mlflow工件并命名每次运行的mlflow日志?

ghg1uchk  于 2023-10-20  发布在  其他
关注(0)|答案(1)|浏览(118)

我正在使用pytorch使用Hugging人脸模型训练一个简单的二元分类模型。
伯特PyTorch拥抱脸。
代码如下:

import transformers
from transformers import TFAutoModel, AutoTokenizer
from tokenizers import Tokenizer, models, pre_tokenizers, decoders, processors
from transformers import AutoTokenizer

 
from transformers import AdamW
from transformers import get_linear_schedule_with_warmup
from transformers import BertTokenizerFast as BertTokenizer, BertModel, AdamW, get_linear_schedule_with_warmup,BertConfig
def compute_metrics(eval_pred):
    logits, labels = eval_pred
   

    predictions = np.argmax(logits, axis=-1)
    
    acc = np.sum(predictions == labels) / predictions.shape[0]
    return {"accuracy": acc,
            'precision': metrics.precision_score(labels, predictions),
            'recall': metrics.recall_score(labels, predictions),
            'f1': metrics.f1_score(labels, predictions)}

training_args = tr.TrainingArguments(
    #report_to = 'wandb',
    output_dir='/home/pc/proj/Exp2_conv_stampy_data/results_exp0',          # output directory
    overwrite_output_dir = True,
    num_train_epochs=2,              # total number of training epochs
    per_device_train_batch_size=32,  # batch size per device during training
    per_device_eval_batch_size=32,   # batch size for evaluation
    learning_rate=2e-5,
    warmup_steps=200,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs_exp0',            # directory for storing logs
    logging_steps=137,
    evaluation_strategy="epoch"
    ,save_strategy="epoch"
    ,load_best_model_at_end=True
    ,fp16=True
    ,run_name="final_model0"
    
)

# counter = 0
# results_lst = []

from transformers import TrainerCallback
from copy import deepcopy

model = tr.XLMRobertaForSequenceClassification.from_pretrained("/home/pc/multilingual_toxic_xlm_roberta",problem_type="single_label_classification", num_labels=2,ignore_mismatched_sizes=True, id2label={0: 'negative', 1: 'positive'})

train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=512, return_tensors="pt")
val_encodings = tokenizer(val_texts, truncation=True, padding=True, max_length=512, return_tensors="pt")

train_data = SEDataset(train_encodings, train_labels)
val_data = SEDataset(val_encodings, val_labels)

model.to(device)

class CustomCallback(TrainerCallback):
    
    def __init__(self, trainer) -> None:
        super().__init__()
        self._trainer = trainer
    
    def on_epoch_end(self, args, state, control, **kwargs):
        if control.should_evaluate:
            control_copy = deepcopy(control)
            self._trainer.evaluate(eval_dataset=self._trainer.train_dataset, metric_key_prefix="train")
            return control_copy

trainer = tr.Trainer(
    model=model,                         # the instantiated Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_data,         # training dataset
    eval_dataset=val_data,          # evaluation dataset
    compute_metrics=compute_metrics    # the callback that computes metrics of interest
)
trainer.add_callback(CustomCallback(trainer)) 
train = trainer.train()


trainer.save_model("/home/pc/proj/Exp2_conv_stampy_data/result_toxic_model_exp0")

我看到默认情况下创建了mlruns目录。

什么是0' and what are these 2 folders inside 0?** **如何重命名为有用和可理解的东西。?** **如果我运行多个运行,我如何在同一个实验下记录每个运行的模型,例如run1run2`?
我还看到工件文件夹是空的,如何记录最终模型?

p4rjhz4m

p4rjhz4m1#

默认情况下,MLflow将运行元数据和工件存储在本地mlruns文件夹中。为了检查运行,您可以运行mlflow ui命令来启动跟踪服务器UI。
实验是在实验中进行的。您可以创建一个实验,每次开始运行时,它都会记录在指定的实验下。为此,您应该指定跟踪服务器uri并在代码中进行实验:

mlflow.set_tracking_uri("Tracking Server URI Here")
mlflow.set_experiment("Experiment Name Here")

如果您没有远程跟踪服务器,那么您应该如上所述在本地运行MLflow UI。
HuggingFace具有MLflow集成,用于自动记录运行。你需要设置一些环境变量:https://huggingface.co/docs/transformers/v4.33.0/en/main_classes/callback#transformers.integrations.MLflowCallback

相关问题