unilm 无法启动演示 ```markdown [Kosmos-2] 无法启动演示 ```

zz2j4svz 于 6个月前发布在其他

关注(0)|答案(7)|浏览(75)

首先，感谢您分享了这个很棒的代码。
在设置好一切之后，当我尝试启动演示时，遇到了以下错误。请帮我解决一下。

(kosmos-2) wendell@:~/unilm/kosmos-2$ bash run_gradio.sh

run_gradio.sh: line 2: $'\r': command not found
run_gradio.sh: line 4: $'\r': command not found
run_gradio.sh: line 6: $'\r': command not found
/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects `--local-rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
  warn(f"Failed to load image Python extension: {e}")
Please install pip install -r visual_requirement.txt for VL dataset
usage: gradio_app.py [-h] [--no-progress-bar] [--log-interval LOG_INTERVAL] [--log-format {json,none,simple,tqdm}] [--log-file LOG_FILE] [--tensorboard-logdir TENSORBOARD_LOGDIR] [--wandb-project WANDB_PROJECT]
                     [--azureml-logging] [--seed SEED] [--cpu] [--tpu] [--bf16] [--memory-efficient-bf16] [--fp16] [--memory-efficient-fp16] [--fp16-no-flatten-grads] [--fp16-init-scale FP16_INIT_SCALE]
                     [--fp16-scale-window FP16_SCALE_WINDOW] [--fp16-scale-tolerance FP16_SCALE_TOLERANCE] [--on-cpu-convert-precision] [--min-loss-scale MIN_LOSS_SCALE] [--threshold-loss-scale THRESHOLD_LOSS_SCALE]
                     [--amp] [--amp-batch-retries AMP_BATCH_RETRIES] [--amp-init-scale AMP_INIT_SCALE] [--amp-scale-window AMP_SCALE_WINDOW] [--user-dir USER_DIR] [--empty-cache-freq EMPTY_CACHE_FREQ]
                     [--all-gather-list-size ALL_GATHER_LIST_SIZE] [--model-parallel-size MODEL_PARALLEL_SIZE] [--quantization-config-path QUANTIZATION_CONFIG_PATH] [--profile] [--reset-logging] [--suppress-crashes]
                     [--use-plasma-view] [--plasma-path PLASMA_PATH] [--deepspeed] [--zero ZERO] [--exit-interval EXIT_INTERVAL]
                     [--criterion {adaptive_loss,composite_loss,cross_entropy,ctc,fastspeech2,hubert,label_smoothed_cross_entropy,latency_augmented_label_smoothed_cross_entropy,label_smoothed_cross_entropy_with_alignment,legacy_masked_lm_loss,masked_lm,model,nat_loss,sentence_prediction,sentence_ranking,tacotron2,speech_to_unit,speech_to_spectrogram,wav2vec,vocab_parallel_cross_entropy,unigpt}]
                     [--tokenizer {moses,nltk,space}] [--bpe {byte_bpe,bytes,characters,fastbpe,gpt2,bert,hf_byte_bpe,sentencepiece,subword_nmt}]
                     [--optimizer {adadelta,adafactor,adagrad,adam,adamax,composite,cpu_adam,lamb,nag,sgd}]
                     [--lr-scheduler {cosine,fixed,inverse_sqrt,manual,pass_through,polynomial_decay,reduce_lr_on_plateau,step,tri_stage,triangular}] [--scoring {sacrebleu,bleu,chrf,meteor,wer}] [--task TASK]
                     [--num-workers NUM_WORKERS] [--skip-invalid-size-inputs-valid-test] [--max-tokens MAX_TOKENS] [--batch-size BATCH_SIZE] [--required-batch-size-multiple REQUIRED_BATCH_SIZE_MULTIPLE]
                     [--required-seq-len-multiple REQUIRED_SEQ_LEN_MULTIPLE] [--dataset-impl {raw,lazy,cached,mmap,fasta,huffman}] [--data-buffer-size DATA_BUFFER_SIZE] [--train-subset TRAIN_SUBSET]
                     [--valid-subset VALID_SUBSET] [--combine-valid-subsets] [--ignore-unused-valid-subsets] [--validate-interval VALIDATE_INTERVAL] [--validate-interval-updates VALIDATE_INTERVAL_UPDATES]
                     [--validate-after-updates VALIDATE_AFTER_UPDATES] [--fixed-validation-seed FIXED_VALIDATION_SEED] [--disable-validation] [--max-tokens-valid MAX_TOKENS_VALID]
                     [--batch-size-valid BATCH_SIZE_VALID] [--max-valid-steps MAX_VALID_STEPS] [--curriculum CURRICULUM] [--gen-subset GEN_SUBSET] [--num-shards NUM_SHARDS] [--shard-id SHARD_ID]
                     [--grouped-shuffling] [--update-epoch-batch-itr UPDATE_EPOCH_BATCH_ITR] [--update-ordered-indices-seed] [--distributed-world-size DISTRIBUTED_WORLD_SIZE]
                     [--distributed-num-procs DISTRIBUTED_NUM_PROCS] [--distributed-rank DISTRIBUTED_RANK] [--distributed-backend DISTRIBUTED_BACKEND] [--distributed-init-method DISTRIBUTED_INIT_METHOD]
                     [--distributed-port DISTRIBUTED_PORT] [--device-id DEVICE_ID] [--distributed-no-spawn] [--ddp-backend {c10d,fully_sharded,legacy_ddp,no_c10d,pytorch_ddp,slowmo}] [--ddp-comm-hook {none,fp16}]
                     [--bucket-cap-mb BUCKET_CAP_MB] [--fix-batches-to-gpus] [--find-unused-parameters] [--gradient-as-bucket-view] [--fast-stat-sync] [--heartbeat-timeout HEARTBEAT_TIMEOUT] [--broadcast-buffers]
                     [--slowmo-momentum SLOWMO_MOMENTUM] [--slowmo-base-algorithm SLOWMO_BASE_ALGORITHM] [--localsgd-frequency LOCALSGD_FREQUENCY] [--nprocs-per-node NPROCS_PER_NODE] [--pipeline-model-parallel]
                     [--pipeline-balance PIPELINE_BALANCE] [--pipeline-devices PIPELINE_DEVICES] [--pipeline-chunks PIPELINE_CHUNKS] [--pipeline-encoder-balance PIPELINE_ENCODER_BALANCE]
                     [--pipeline-encoder-devices PIPELINE_ENCODER_DEVICES] [--pipeline-decoder-balance PIPELINE_DECODER_BALANCE] [--pipeline-decoder-devices PIPELINE_DECODER_DEVICES]
                     [--pipeline-checkpoint {always,never,except_last}] [--zero-sharding {none,os}] [--no-reshard-after-forward] [--fp32-reduce-scatter] [--cpu-offload] [--use-sharded-state]
                     [--not-fsdp-flatten-parameters] [--path PATH] [--post-process [POST_PROCESS]] [--quiet] [--model-overrides MODEL_OVERRIDES] [--results-path RESULTS_PATH] [--beam BEAM] [--nbest NBEST]
                     [--max-len-a MAX_LEN_A] [--max-len-b MAX_LEN_B] [--min-len MIN_LEN] [--match-source-len] [--unnormalized] [--no-early-stop] [--no-beamable-mm] [--lenpen LENPEN] [--unkpen UNKPEN]
                     [--replace-unk [REPLACE_UNK]] [--sacrebleu] [--score-reference] [--prefix-size PREFIX_SIZE] [--no-repeat-ngram-size NO_REPEAT_NGRAM_SIZE] [--sampling] [--sampling-topk SAMPLING_TOPK]
                     [--sampling-topp SAMPLING_TOPP] [--constraints [{ordered,unordered}]] [--temperature TEMPERATURE] [--diverse-beam-groups DIVERSE_BEAM_GROUPS] [--diverse-beam-strength DIVERSE_BEAM_STRENGTH]
                     [--diversity-rate DIVERSITY_RATE] [--print-alignment [{hard,soft}]] [--print-step] [--lm-path LM_PATH] [--lm-weight LM_WEIGHT] [--iter-decode-eos-penalty ITER_DECODE_EOS_PENALTY]
                     [--iter-decode-max-iter ITER_DECODE_MAX_ITER] [--iter-decode-force-max-iter] [--iter-decode-with-beam ITER_DECODE_WITH_BEAM] [--iter-decode-with-external-reranker] [--retain-iter-history]
                     [--retain-dropout] [--retain-dropout-modules RETAIN_DROPOUT_MODULES] [--decoding-format {unigram,ensemble,vote,dp,bs}] [--no-seed-provided] [--save-dir SAVE_DIR] [--restore-file RESTORE_FILE]
                     [--continue-once CONTINUE_ONCE] [--finetune-from-model FINETUNE_FROM_MODEL] [--reset-dataloader] [--reset-lr-scheduler] [--reset-meters] [--reset-optimizer]
                     [--optimizer-overrides OPTIMIZER_OVERRIDES] [--save-interval SAVE_INTERVAL] [--save-interval-updates SAVE_INTERVAL_UPDATES] [--keep-interval-updates KEEP_INTERVAL_UPDATES]
                     [--keep-interval-updates-pattern KEEP_INTERVAL_UPDATES_PATTERN] [--keep-last-epochs KEEP_LAST_EPOCHS] [--keep-best-checkpoints KEEP_BEST_CHECKPOINTS] [--no-save] [--no-epoch-checkpoints]
                     [--no-last-checkpoints] [--no-save-optimizer-state] [--best-checkpoint-metric BEST_CHECKPOINT_METRIC] [--maximize-best-checkpoint-metric] [--patience PATIENCE]
                     [--checkpoint-suffix CHECKPOINT_SUFFIX] [--checkpoint-shard-count CHECKPOINT_SHARD_COUNT] [--load-checkpoint-on-all-dp-ranks] [--write-checkpoints-asynchronously] [--buffer-size BUFFER_SIZE]
                     [--input INPUT] [--source-lang SOURCE_LANG] [--target-lang TARGET_LANG] [--load-alignments] [--left-pad-source] [--left-pad-target] [--max-source-positions MAX_SOURCE_POSITIONS]
                     [--max-target-positions MAX_TARGET_POSITIONS] [--upsample-primary UPSAMPLE_PRIMARY] [--truncate-source] [--num-batch-buckets NUM_BATCH_BUCKETS] [--eval-bleu] [--eval-bleu-args EVAL_BLEU_ARGS]
                     [--eval-bleu-detok EVAL_BLEU_DETOK] [--eval-bleu-detok-args EVAL_BLEU_DETOK_ARGS] [--eval-tokenized-bleu] [--eval-bleu-remove-bpe [EVAL_BLEU_REMOVE_BPE]] [--eval-bleu-print-samples]
                     [--force-anneal FORCE_ANNEAL] [--lr-shrink LR_SHRINK] [--warmup-updates WARMUP_UPDATES] [--pad PAD] [--eos EOS] [--unk UNK]
                     data
gradio_app.py: error: unrecognized arguments: --local-rank=0 
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 648457) of binary: /home/wendell/anaconda3/envs/kosmos-2/bin/python
Traceback (most recent call last):
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py", line 196, in <module>
    main()
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py", line 192, in main
    launch(args)
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py", line 177, in launch
    run(args)
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
demo/gradio_app.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-10-14_19:11:23
  host      : DESKTOP-3Q0HFJ3.
  rank      : 0 (local_rank: 0)
  exitcode  : 2 (pid: 648457)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
run_gradio.sh: line 8: --task: command not found
run_gradio.sh: line 9: --path: command not found
run_gradio.sh: line 11: --model-overrides: command not found
run_gradio.sh: line 12: --dict-path: command not found
run_gradio.sh: line 13: --required-batch-size-multiple: command not found
run_gradio.sh: line 14: --remove-bpe=sentencepiece: command not found
run_gradio.sh: line 15: --max-len-b: command not found
run_gradio.sh: line 16: --add-bos-token: command not found
run_gradio.sh: line 17: --beam: command not found
run_gradio.sh: line 18: --buffer-size: command not found
run_gradio.sh: line 19: --image-feature-length: command not found
run_gradio.sh: line 20: --locate-special-token: command not found
run_gradio.sh: line 21: --batch-size: command not found
run_gradio.sh: line 22: --nbest: command not found
run_gradio.sh: line 23: --no-repeat-ngram-size: command not found
run_gradio.sh: line 24: --location-bin-size: command not found

运行 gradio.sh

#!/bin/bash

model_path=./path/kosmos2.pt

master_port=$((RANDOM%1000+20000))

CUDA_LAUNCH_BLOCKING=1 CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port=$master_port --nproc_per_node=1 demo/gradio_app.py None \
    --task generation_obj \
    --path $model_path \
    --model-overrides "{'visual_pretrained': '',
            'dict_path':'data/dict.txt'}" \
    --dict-path 'data/dict.txt' \
    --required-batch-size-multiple 1 \
    --remove-bpe=sentencepiece \
    --max-len-b 500 \
    --add-bos-token \
    --beam 1 \
    --buffer-size 1 \
    --image-feature-length 64 \
    --locate-special-token 1 \
    --batch-size 1 \
    --nbest 1 \
    --no-repeat-ngram-size 3 \
    --location-bin-size 32

软件包版本

------------------------- -------------------------
aiofiles                  23.2.1
aiohttp                   3.8.6
aiosignal                 1.3.1
altair                    5.1.2
annotated-types           0.6.0
antlr4-python3-runtime    4.8
anyio                     3.7.1
apex                      0.1
async-timeout             4.0.3
attrs                     23.1.0
bitarray                  2.8.2
blis                      0.7.11
braceexpand               0.1.7
catalogue                 2.0.10
certifi                   2023.7.22
cffi                      1.16.0
charset-normalizer        3.3.0
click                     8.1.7
colorama                  0.4.6
confection                0.1.3
contourpy                 1.1.1
cycler                    0.12.1
cymem                     2.0.8
Cython                    3.0.3
deepspeed                 0.4.4+165739a5
exceptiongroup            1.1.3
fairscale                 0.4.0
fairseq                   1.0.0a0+b237f42
fastapi                   0.103.2
ffmpy                     0.3.1
filelock                  3.12.4
fonttools                 4.43.1
frozenlist                1.4.0
fsspec                    2023.9.2
ftfy                      6.1.1
gmpy2                     2.1.2
gradio                    3.37.0
gradio_client             0.6.0
h11                       0.14.0
httpcore                  0.17.3
httpx                     0.24.1
huggingface-hub           0.18.0
hydra-core                1.0.7
idna                      3.4
importlib-resources       6.1.0
infinibatch               0.1.0
Jinja2                    3.1.2
jsonschema                4.19.1
jsonschema-specifications 2023.7.1
kiwisolver                1.4.5
langcodes                 3.3.0
linkify-it-py             2.0.2
lxml                      4.9.3
markdown-it-py            2.2.0
MarkupSafe                2.1.1
matplotlib                3.8.0
mdit-py-plugins           0.3.3
mdurl                     0.1.2
mpmath                    1.3.0
multidict                 6.0.4
murmurhash                1.0.10
networkx                  3.1
ninja                     1.11.1.1
numpy                     1.23.0
nvidia-cublas-cu11        11.10.3.66
nvidia-cuda-nvrtc-cu11    11.7.99
nvidia-cuda-runtime-cu11  11.7.99
nvidia-cudnn-cu11         8.5.0.96
omegaconf                 2.0.6
open-clip-torch           1.3.0
opencv-python-headless    4.8.0.74
orjson                    3.9.9
packaging                 23.2
pandas                    2.1.1
pathy                     0.10.2
Pillow                    10.0.1
pip                       23.2.1
portalocker               2.8.2
preshed                   3.0.9
protobuf                  3.20.3
psutil                    5.9.5
pycparser                 2.21
pydantic                  1.10.11
pydantic_core             2.10.1
pydub                     0.25.1
pyparsing                 3.1.1
python-dateutil           2.8.2
python-multipart          0.0.6
pytz                      2023.3.post1
PyYAML                    6.0.1
referencing               0.30.2
regex                     2023.10.3
requests                  2.31.0
rpds-py                   0.10.6
sacrebleu                 2.3.1
scipy                     1.8.0
semantic-version          2.10.0
sentencepiece             0.1.99
setuptools                68.0.0
six                       1.16.0
smart-open                6.4.0
sniffio                   1.3.0
spacy                     3.6.0
spacy-legacy              3.0.12
spacy-loggers             1.0.5
srsly                     2.4.8
starlette                 0.27.0
sympy                     1.11.1
tabulate                  0.9.0
tensorboardX              1.8
thinc                     8.1.10
tiktoken                  0.5.1
timm                      0.4.12
toolz                     0.12.0
torch                     1.13.0
torchscale                0.1.1
torchvision               0.14.0
tqdm                      4.66.1
triton                    2.0.0
typer                     0.9.0
typing_extensions         4.7.1
tzdata                    2023.3
uc-micro-py               1.0.2
urllib3                   2.0.6
uvicorn                   0.23.2
wasabi                    1.1.2
wcwidth                   0.2.8
webdataset                0.2.57
websockets                11.0.3
wheel                     0.41.2
xformers                  0.0.23.dev652+git.705810f
yarl                      1.9.2
zipp                      3.17.0

在设置环境方面遇到了许多困难，在确保一切都正确配置后，当我运行 run_gradio.sh 时仍然出现错误。
希望得到帮助。谢谢！

unilm

来源：https://github.com/microsoft/unilm/issues/1333

7条答案

按热度按时间

o0lyfsai1#

#####################
#
# Use this with or without the .gitattributes snippet with this Gist
# create a fixle.sh file, paste this in and run it.
# Why do you want this ? Because Git will see diffs between files shared between Linux and Windows due to differences in line ending handling ( Windows uses CRLF and Unix LF) 
# This Gist normalizes handling by forcing everything to use Unix style.
#####################

# Fix Line Endings - Force All Line Endings to LF and Not Windows Default CR or CRLF
# Taken largely from: https://help.github.com/articles/dealing-with-line-endings/
# With the exception that we are forcing LF instead of converting to windows style.

#Set LF as your line ending default.
git config --global core.eol lf

#Set autocrlf to false to stop converting between windows style (CRLF) and Unix style (LF)
git config --global core.autocrlf false

#Save your current files in Git, so that none of your work is lost.
git add . -u
git commit -m "Saving files before refreshing line endings"


#Remove the index and force Git to rescan the working directory.
rm .git/index

#Rewrite the Git index to pick up all the new line endings.
git reset

#Show the rewritten, normalized files.

git status

#Add all your changed files back, and prepare them for a commit. This is your chance to inspect which files, if any, were unchanged.

git add -u
# It is perfectly safe to see a lot of messages here that read
# "warning: CRLF will be replaced by LF in file."

#Rewrite the .gitattributes file.
git add .gitattributes

#Commit the changes to your repository.

git commit -m "Normalize all the line endings"

赞(0）回复(0）举报 6个月前

lf5gs5x22#

首先，感谢您的帮助！我尝试使用您提供的方法，并收到以下信息。

[master 45d484f] Saving files before refreshing line endings
 1 file changed, 40 insertions(+)
 create mode 100644 fixle.sh
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean
fatal: pathspec '.gitattributes' did not match any files
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean

随后，我运行run_gradio.sh并遇到了以下错误。

FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
2023-10-16 15:06:08 | WARNING | xformers | WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.1 with CUDA 1108 (you have 1.13.0+cu117)
    Python  3.9.18 (you have 3.9.18)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
Please install pip install -r visual_requirement.txt for VL dataset
2023-10-16 15:06:10 | INFO | fairseq.distributed.utils | distributed init (rank 0): env://
2023-10-16 15:06:10 | INFO | torch.distributed.distributed_c10d | Added key: store_based_barrier_key:1 to store for rank: 0
2023-10-16 15:06:10 | INFO | torch.distributed.distributed_c10d | Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
2023-10-16 15:06:10 | INFO | fairseq.distributed.utils | initialized host DESKTOP-3Q0HFJ3 as rank 0
2023-10-16 15:06:11 | INFO | fairseq_cli.interactive | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'log_file': None, 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_convert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma', 'deepspeed': False, 'zero': 0, 'exit_interval': 0}, 'common_eval': {'_name': None, 'path': '/path/kosmos2.pt', 'post_process': 'sentencepiece', 'quiet': False, 'model_overrides': "{'visual_pretrained': '',\n            'dict_path':'data/dict.txt'}", 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_num_procs': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': 'env://', 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': True, 'ddp_backend': 'pytorch_ddp', 'ddp_comm_hook': 'none', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'gradient_as_bucket_view': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_base_algorithm': 'localsgd', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': False, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scatter': False, 'cpu_offload': False, 'use_sharded_state': False, 'not_fsdp_flatten_parameters': False}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': None, 'batch_size': 1, 'required_batch_size_multiple': 1, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'combine_valid_subsets': None, 'ignore_unused_valid_subsets': False, 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': None, 'batch_size_valid': 1, 'max_valid_steps': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0, 'grouped_shuffling': False, 'update_epoch_batch_itr': False, 'update_ordered_indices_seed': False}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.25], 'stop_min_lr': -1.0, 'use_bmuf': False, 'skip_remainder_batch': False}, 'checkpoint': {'_name': None, 'save_dir': 'checkpoints', 'restore_file': 'checkpoint_last.pt', 'continue_once': None, 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 1, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 500, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 3, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 1, 'input': '-'}, 'model': None, 'task': {'_name': 'generation_obj', 'data': 'None', 'sample_break_mode': 'none', 'tokens_per_sample': 1024, 'output_dictionary_size': -1, 'self_target': False, 'future_target': False, 'past_target': False, 'add_bos_token': True, 'max_target_positions': None, 'shorten_method': 'none', 'shorten_data_split_list': '', 'pad_to_fixed_length': False, 'pad_to_fixed_bsz': False, 'seed': 1, 'batch_size': 1, 'batch_size_valid': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'tpu': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma', 'required_batch_size_multiple': 1, 'dict_path': 'data/dict.txt', 'image_feature_length': 64, 'input_resolution': 224, 'location_bin_size': 32, 'locate_special_token': 1}, 'criterion': {'_name': 'cross_entropy', 'sentence_avg': True}, 'optimizer': None, 'lr_scheduler': {'_name': 'fixed', 'force_anneal': None, 'lr_shrink': 0.1, 'warmup_updates': 0, 'lr': [0.25]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None, 'ema': {'_name': None, 'store_ema': False, 'ema_decay': 0.9999, 'ema_start_update': 0, 'ema_seed_model': None, 'ema_update_freq': 1, 'ema_fp32': False}}
2023-10-16 15:06:11 | INFO | fairseq_cli.interactive | Task: {'_name': 'generation_obj', 'data': 'None', 'sample_break_mode': 'none', 'tokens_per_sample': 1024, 'output_dictionary_size': -1, 'self_target': False, 'future_target': False, 'past_target': False, 'add_bos_token': True, 'max_target_positions': None, 'shorten_method': 'none', 'shorten_data_split_list': '', 'pad_to_fixed_length': False, 'pad_to_fixed_bsz': False, 'seed': 1, 'batch_size': 1, 'batch_size_valid': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'tpu': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma', 'required_batch_size_multiple': 1, 'dict_path': 'data/dict.txt', 'image_feature_length': 64, 'input_resolution': 224, 'location_bin_size': 32, 'locate_special_token': 1}
2023-10-16 15:06:11 | INFO | unilm.tasks.generation_obj | dictionary from data/dict.txt: 65037 types
2023-10-16 15:06:11 | INFO | fairseq_cli.interactive | loading model(s) from /path/kosmos2.pt
Traceback (most recent call last):
  File "/home/wendell/unilm/kosmos-2/demo/gradio_app.py", line 611, in <module>
    cli_main()
  File "/home/wendell/unilm/kosmos-2/demo/gradio_app.py", line 607, in cli_main
    distributed_utils.call_main(convert_namespace_to_omegaconf(args), main)
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/fairseq/distributed/utils.py", line 359, in call_main
    distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/fairseq/distributed/utils.py", line 333, in distributed_main
    main(cfg, **kwargs)
  File "/home/wendell/unilm/kosmos-2/demo/gradio_app.py", line 265, in main
    models, _model_args = checkpoint_utils.load_model_ensemble(
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/fairseq/checkpoint_utils.py", line 385, in load_model_ensemble
    ensemble, args, _task = load_model_ensemble_and_task(
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/fairseq/checkpoint_utils.py", line 441, in load_model_ensemble_and_task
    raise IOError("Model file not found: {}".format(filename))
OSError: Model file not found: /path/kosmos2.pt
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 678564) of binary: /home/wendell/anaconda3/envs/kosmos-2/bin/python
Traceback (most recent call last):
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py", line 195, in <module>
    main()
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py", line 191, in main
    launch(args)
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py", line 176, in launch
    run(args)
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
demo/gradio_app.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-10-16_15:06:12
  host      : DESKTOP-3Q0HFJ3.
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 678564)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

我正在使用WSL(Windows子系统Linux)Ubuntu 22.04.2。我不确定这是否会产生影响。
我认为Xformer警告可以忽略，但我不确定当前的错误是由于我在使用您提供的方法时犯了任何错误。我对Git不太熟悉，为此表示歉意。
请帮助我解决这个问题。

赞(0）回复(0）举报 6个月前

xtupzzrd3#

我明白了。这个错误可能是由于使用了WSL(Windows Subsystem for Linux)导致的。我不确定Gradio是否支持在WSL下运行。

赞(0）回复(0）举报 6个月前

gcuhipw94#

好的，我明白了。我会尝试改变环境。非常感谢您的帮助！

赞(0）回复(0）举报 6个月前

cetgtptt5#

我明白了。这个错误可能是由使用WSL引起的。我不确定WSL是否支持Gradio。
你好！@donglixp,感谢到目前为止你的所有帮助。我已经确认WSL支持Gradio。
当前的错误：

(kosmos) wendell@DESKTOP-3Q0HFJ3:~/unilm/kosmos-2$ bash run_gradio.sh
/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
Traceback (most recent call last):
  File "/home/wendell/unilm/kosmos-2/demo/gradio_app.py", line 12, in <module>
    import unilm
  File "/home/wendell/unilm/kosmos-2/./unilm/__init__.py", line 1, in <module>
    import unilm.models
  File "/home/wendell/unilm/kosmos-2/./unilm/models/__init__.py", line 6, in <module>
    import_models(models_dir, "unilm.models")
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/fairseq/models/__init__.py", line 217, in import_models
    importlib.import_module(namespace + "." + model_name)
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/home/wendell/unilm/kosmos-2/./unilm/models/gpt_eval.py", line 39, in <module>
    from torchscale.architecture.decoder import Decoder
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torchscale/architecture/decoder.py", line 12, in <module>
    from torchscale.architecture.utils import init_bert_params
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torchscale/architecture/utils.py", line 6, in <module>
    from torchscale.component.multihead_attention import MultiheadAttention
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torchscale/component/multihead_attention.py", line 12, in <module>
    from xformers.ops import memory_efficient_attention, LowerTriangularMask, MemoryEfficientAttentionCutlassOp
ModuleNotFoundError: No module named 'xformers'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 445686) of binary: /home/wendell/anaconda3/envs/kosmos/bin/python
Traceback (most recent call last):
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py", line 195, in <module>
    main()
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py", line 191, in main
    launch(args)
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py", line 176, in launch
    run(args)
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
demo/gradio_app.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-10-23_22:58:01
  host      : DESKTOP-3Q0HFJ3.
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 445686)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

from xformers.ops import memory_efficient_attention, LowerTriangularMask, MemoryEfficientAttentionCutlassOp
ModuleNotFoundError: No module named 'xformers'

我有xformer,但它目前是1.0.1版本

如果你能帮忙，请告诉我。谢谢！

赞(0）回复(0）举报 6个月前

68bkxrlz6#

我调整了xformer的版本。当前错误：

/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
Traceback (most recent call last):
  File "/home/wendell/unilm/kosmos-2/demo/gradio_app.py", line 12, in <module>
    import unilm
  File "/home/wendell/unilm/kosmos-2/./unilm/__init__.py", line 3, in <module>
    import unilm.tasks
  File "/home/wendell/unilm/kosmos-2/./unilm/tasks/__init__.py", line 7, in <module>
    import_tasks(tasks_dir, "unilm.tasks")
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/fairseq/tasks/__init__.py", line 117, in import_tasks
    importlib.import_module(namespace + "." + task_name)
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/home/wendell/unilm/kosmos-2/./unilm/tasks/generation_obj.py", line 33, in <module>
    from unilm.data.utils import SPECIAL_SYMBOLS, add_location_symbols
  File "/home/wendell/unilm/kosmos-2/./unilm/data/utils.py", line 8, in <module>
    from infinibatch import iterators
ImportError: cannot import name 'iterators' from 'infinibatch' (unknown location)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 648380) of binary: /home/wendell/anaconda3/envs/kosmos/bin/python
Traceback (most recent call last):
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py", line 195, in <module>
    main()
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py", line 191, in main
    launch(args)
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py", line 176, in launch
    run(args)
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
demo/gradio_app.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-10-24_08:00:09
  host      : DESKTOP-3Q0HFJ3.
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 648380)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

赞(0）回复(0）举报 6个月前

ygya80vv7#

有人解决了上一个错误吗？

赞(0）回复(0）举报 6个月前