Read the training data into a dataframe..
Reading the training data into a dataframe has been completed..
Setting up the HuggingFace API Token..
Huggingface token is added in the environment..
Load the Ludwig configuration YAML file..
Loading the Ludwig configuration YAML file has been completed..
Loading the Base Model..
Setting generation max_new_tokens to 512 to correspond with the max sequence length assigned to the output feature or the global max sequence length. This will ensure that the correct number of tokens are generated at inference time. To override this behavior, set `generation.max_new_tokens` to a different value in your Ludwig config.
Loading the trained Base Model has been completed..
Starting the Fine Tuning..
╒════════════════════════╕
│ EXPERIMENT DESCRIPTION │
╘════════════════════════╛
╒══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════╕
│ Experiment name │ api_experiment │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Model name │ run │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Output directory │ /home/ubuntu/results/api_experiment_run_19 │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ ludwig_version │ '0.9.3' │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ command │ '/home/ubuntu/train_llama-2_7b_Log_Analytics_8bit_merged_v8/codebase/train_llama_using_ludwig.py' │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ random_seed │ 42 │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ data_format │ "<class 'pandas.core.frame.DataFrame'>" │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ torch_version │ '2.1.0+cu121' │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ compute │ { 'arch_list': [ 'sm_50', │
│ │ 'sm_60', │
│ │ 'sm_70', │
│ │ 'sm_75', │
│ │ 'sm_80', │
│ │ 'sm_86', │
│ │ 'sm_90'], │
│ │ 'devices': { 0: { 'device_capability': (8, 6), │
│ │ 'device_properties': "_CudaDeviceProperties(name='NVIDIA " │
│ │ "A10G', major=8, minor=6, " │
│ │ 'total_memory=22723MB, ' │
│ │ 'multi_processor_count=80)', │
│ │ 'gpu_type': 'NVIDIA A10G'}}, │
│ │ 'gencode_flags': '-gencode compute=compute_50,code=sm_50 -gencode ' │
│ │ 'compute=compute_60,code=sm_60 -gencode ' │
│ │ 'compute=compute_70,code=sm_70 -gencode ' │
│ │ 'compute=compute_75,code=sm_75 -gencode ' │
│ │ 'compute=compute_80,code=sm_80 -gencode ' │
│ │ 'compute=compute_86,code=sm_86 -gencode ' │
│ │ 'compute=compute_90,code=sm_90', │
│ │ 'gpus_per_node': 1, │
│ │ 'num_nodes': 1} │
╘══════════════════╧═══════════════════════════════════════════════════════════════════════════════════════════════════╛
╒═══════════════╕
│ LUDWIG CONFIG │
╘═══════════════╛
User-specified config (with upgrades):
{ 'adapter': { 'alpha': 16,
'bias_type': 'none',
'dropout': 0.05,
'postprocessor': { 'merge_adapter_into_base_model': True,
'progressbar': True},
'pretrained_adapter_weights': None,
'r': 8,
'target_modules': None,
'type': 'lora'},
'backend': {'type': 'local'},
'base_model': '/home/ubuntu/results/api_experiment_run_15/model/model_weights',
'input_features': [ { 'name': 'prompt',
'preprocessing': {'max_sequence_length': 1024},
'type': 'text'}],
'ludwig_version': '0.9.3',
'model_type': 'llm',
'output_features': [ { 'name': 'Response',
'preprocessing': {'max_sequence_length': 512},
'type': 'text'}],
'preprocessing': {'sample_ratio': 1.0},
'prompt': { 'template': '### Instruction:\n'
'{Instruction}\n'
'\n'
'### Context:\n'
'{Context}\n'
'\n'
'### Response:\n'},
'quantization': {'bits': 8},
'trainer': { 'batch_size': 1,
'enable_gradient_checkpointing': True,
'epochs': 3,
'gradient_accumulation_steps': 1,
'learning_rate': 0.0001,
'learning_rate_scheduler': {'warmup_fraction': 0.01},
'max_batch_size': 1,
'type': 'finetune'}}
Full config saved to:
/home/ubuntu/results/api_experiment_run_19/api_experiment/model/model_hyperparameters.json
╒═══════════════╕
│ PREPROCESSING │
╘═══════════════╛
No cached dataset found at /home/ubuntu/eeff4f02cfeb11ee808f12abaaebd043.training.hdf5. Preprocessing the dataset.
Using full dataframe
Building dataset (it may take a while)
Loaded HuggingFace implementation of /home/ubuntu/results/api_experiment_run_15/model/model_weights tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Max length of feature 'None': 143 (without start and stop symbols)
Max sequence length is 143 for feature 'None'
Loaded HuggingFace implementation of /home/ubuntu/results/api_experiment_run_15/model/model_weights tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Max length of feature 'Response': 144 (without start and stop symbols)
Max sequence length is 144 for feature 'Response'
Loaded HuggingFace implementation of /home/ubuntu/results/api_experiment_run_15/model/model_weights tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Loaded HuggingFace implementation of /home/ubuntu/results/api_experiment_run_15/model/model_weights tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Building dataset: DONE
Writing preprocessed training set cache to /home/ubuntu/eeff4f02cfeb11ee808f12abaaebd043.training.hdf5
Writing preprocessed validation set cache to /home/ubuntu/eeff4f02cfeb11ee808f12abaaebd043.validation.hdf5
Writing preprocessed test set cache to /home/ubuntu/eeff4f02cfeb11ee808f12abaaebd043.test.hdf5
Writing train set metadata to /home/ubuntu/eeff4f02cfeb11ee808f12abaaebd043.meta.json
Dataset Statistics
╒════════════╤═══════════════╤════════════════════╕
│ Dataset │ Size (Rows) │ Size (In Memory) │
╞════════════╪═══════════════╪════════════════════╡
│ Training │ 31 │ 7.39 Kb │
├────────────┼───────────────┼────────────────────┤
│ Validation │ 4 │ 1.06 Kb │
├────────────┼───────────────┼────────────────────┤
│ Test │ 9 │ 2.23 Kb │
╘════════════╧═══════════════╧════════════════════╛
╒═══════╕
│ MODEL │
╘═══════╛
Warnings and other logs:
Loading large language model...
We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|█████ | 1/2 [01:15<01:15, 75.45s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [01:42<00:00, 46.69s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [01:42<00:00, 51.01s/it]
Done.
Loaded HuggingFace implementation of /home/ubuntu/results/api_experiment_run_15/model/model_weights tokenizer
Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
==================================================
Trainable Parameter Summary For Fine-Tuning
Fine-tuning with adapter: lora
trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199
==================================================
Gradient checkpointing enabled for training.
╒══════════╕
│ TRAINING │
╘══════════╛
Creating fresh model training run.
Training for 93 step(s), approximately 3 epoch(s).
Early stopping policy: 5 round(s) of evaluation, or 155 step(s), approximately 5 epoch(s).
Starting with step 0, epoch: 0
Training: 0%| | 0/93 [00:00<?, ?it/s]/opt/conda/envs/ludwig_train_env/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Unable to complete the finetuning due to error Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
Training: 0%| | 0/93 [00:00<?, ?it/s]
PyTorch version 2.2.0 available.
███████████████████████
█ █ █ █ ▜█ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ ███
█ █ █ █ █ █ █ █ █ ▌ █
█ █████ █ █ █ █ █ █ █ █
█ █ ▟█ █ █ █
███████████████████████
ludwig v0.9.3 - Train
Traceback (most recent call last):
File "/home/azureuser/ludwig/venv/bin/ludwig", line 8, in <module>
sys.exit(main())
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/cli.py", line 197, in main
CLI()
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/cli.py", line 72, in __init__
getattr(self, args.command)()
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/cli.py", line 77, in train
train.cli(sys.argv[2:])
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/train.py", line 395, in cli
train_cli(**vars(args))
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/train.py", line 176, in train_cli
model = LudwigModel(
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/api.py", line 317, in __init__
self.config_obj = ModelConfig.from_dict(self._user_config)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/schema/model_types/base.py", line 141, in from_dict
config_obj: ModelConfig = schema.load(config)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/marshmallow_dataclass/__init__.py", line 730, in load
return clazz(**all_loaded)
File "<string>", line 18, in __init__
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/schema/model_types/base.py", line 73, in __post_init__
set_llm_parameters(self)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/schema/model_types/utils.py", line 314, in set_llm_parameters
_set_generation_max_new_tokens(config)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/schema/model_types/utils.py", line 401, in _set_generation_max_new_tokens
max_possible_sequence_length = _get_maximum_possible_sequence_length(config, _DEFAULT_MAX_SEQUENCE_LENGTH)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/schema/model_types/utils.py", line 377, in _get_maximum_possible_sequence_length
model_config = AutoConfig.from_pretrained(config.base_model)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1100, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 634, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
resolved_config_file = cached_file(
File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 356, in cached_file
raise EnvironmentError(
OSError: /home/azureuser/ludwig/results/experiment_run_50/model/model_weights does not appear to have a file named config.json. Checkout 'https://huggingface.co//home/azureuser/ludwig/results/experiment_run_50/model/model_weights/None' for available files.
4条答案
按热度按时间efzxgjgh1#
看起来是我这边生成的模型本身存在问题。
kmbjn2e32#
你好,我们正在尝试执行增量训练,但是遇到了以下错误
完整日志文件->
你能帮助我们解决这个问题吗?
你好@所有人,
我也遇到了同样的问题。
有人解决了吗?
谢谢!
xxb16uws3#
你好,有人能帮我解决这个错误吗?
biswetbf4#
我正在尝试同样的事情,但我得到一个错误,而且更早:
你是如何让模型在一开始就加载的?