模型没有响应,长时间运行后,Ollama会卡住,

6ss1mwsb  于 2个月前  发布在  其他
关注(0)|答案(4)|浏览(40)

问题是什么?
下午好。
我正在使用https://ollama.com/library/mixtral:instruct重写数据集。
Ollama在每个涉及使用模型的任务中似乎随机卡住。
操作系统是Ubuntu 22.04。
推理和运行模型会卡住:

lggarcia@turing:~$ nvidia-smi
Thu Jun 20 13:04:11 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA H100 80GB HBM3          Off | 00000000:55:00.0 Off |                    0 |
| N/A   52C    P0             150W / 200W |  25168MiB / 81559MiB |     29%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          Off | 00000000:68:00.0 Off |                    0 |
| N/A   52C    P0             167W / 200W |  35500MiB / 81559MiB |     57%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          Off | 00000000:D2:00.0 Off |                    0 |
| N/A   52C    P0             157W / 200W |  79420MiB / 81559MiB |     25%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          Off | 00000000:E4:00.0 Off |                    0 |
| N/A   53C    P0             156W / 200W |  71286MiB / 81559MiB |     31%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    153203      C   python                                      708MiB |
|    0   N/A  N/A    440608      C   ...unners/cuda_v11/ollama_llama_server    24442MiB |
|    1   N/A  N/A     79068      C   python                                    17238MiB |
|    1   N/A  N/A    153203      C   python                                      706MiB |
|    1   N/A  N/A    440608      C   ...unners/cuda_v11/ollama_llama_server    17532MiB |
|    2   N/A  N/A    153203      C   python                                      706MiB |
|    2   N/A  N/A    440608      C   ...unners/cuda_v11/ollama_llama_server    25808MiB |
|    2   N/A  N/A    551205      C   ...astor/.conda/envs/mixenv/bin/python    52882MiB |
|    3   N/A  N/A    153203      C   python                                      706MiB |
|    3   N/A  N/A    440608      C   ...unners/cuda_v11/ollama_llama_server    24442MiB |
|    3   N/A  N/A    468947      C   ...astor/.conda/envs/mixenv/bin/python    46114MiB |
+---------------------------------------------------------------------------------------+
lggarcia@turing:~$ ollama list
NAME                                            ID              SIZE    MODIFIED
command-r:latest                                b8cdfff0263c    20 GB   46 hours ago
hro/laser-dolphin-mixtral-2x7b-dpo:latest       a2f4da69f5ae    7.8 GB  2 days ago
phi3:latest                                     64c1188f2485    2.4 GB  7 days ago
phi3:medium                                     1e67dff39209    7.9 GB  8 days ago
thebloke/laser-dolphin-mixtral-2x7b-dpo:latest  f1dda7448ba2    7.8 GB  9 days ago
llama3:instruct                                 365c0bd3c000    4.7 GB  2 weeks ago
llama3:70b-instruct                             786f3184aec0    39 GB   3 weeks ago
llama3:70b                                      786f3184aec0    39 GB   3 weeks ago
mixtral:instruct                                d39eb76ed9c5    26 GB   3 weeks ago
mixtral:8x7b                                    d39eb76ed9c5    26 GB   3 weeks ago
mixtral:v0.1-instruct                           6a0910fa6dc1    79 GB   3 weeks ago
llama2:latest                                   78e26419b446    3.8 GB  3 weeks ago
lggarcia@turing:~$ ollama run phi3:latest
⠴

Ollama运行命令不再起作用,它会一直卡住,直到我杀死进程。

lggarcia@turing:~$ ollama --version
ollama version is 0.1.44
lggarcia@turing:~$ ollama ps
NAME                    ID              SIZE    PROCESSOR       UNTIL
mixtral:v0.1-instruct   6a0910fa6dc1    91 GB   100% GPU        Less than a second ago
lggarcia@turing:~$

这是Linux服务配置:

Environment="OLLAMA_MODELS=/datassd/proyectos/modelos"
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_MAX_LOADED_MODELS=8"
Environment="OLLAMA_NUM_PARALLEL=8"
Environment="OLLAMA_DEBUG=1"

操作系统

Linux

GPU

Nvidia

CPU

  • 无响应*

Ollama版本

0.1.44

cotxawn7

cotxawn71#

你好,@luisgg98。很抱歉发生了这种情况。请问你是如何提示模型的?这样我才能尝试重现这个问题。是一次性发送大量提示吗?非常感谢!

cig3rfwq

cig3rfwq2#

这是我唯一被允许分享的代码片段:

def recalculate_summary(df):
    template_summarizer = """<s>[INST] Generate a concise summary in Spanish the following interview: {input} [/INST]"""
    prompt_summarizer = PromptTemplate.from_template(template=template_summarizer) #, input_variables=["input"], verbose=True)
    llm = Ollama(base_url='http://localhost:11434', model= 'mixtral:v0.1-instruct')
    output_parser = StrOutputParser()
    chain_summarizer = prompt_summarizer | llm | output_parser
    
    df_calculated = load_file(SUMMARY_OK_PATH)

    for index, info in df.iterrows():
        row = pd.DataFrame()
        input_text = info['text']

        with get_openai_callback() as cb:
            prompt_summarizer.format(input=input_text)
            start_time = time.time()
            # print('-1-')
            summary = chain_summarizer.invoke({"input": input_text})
            # print('-2-')
            summary_time = time.time() - start_time
            row['input_text'] = [input_text]
            row['summary'] = [summary.strip()]
            row['summary_time'] = [summary_time]
            row['summary_total_tokens'] = [cb.total_tokens]
            row['summary_completion_tokens'] = [cb.completion_tokens]

        df_calculated = pd.concat([df_calculated, row], axis=0)
        df_calculated.to_csv(SUMMARY_OK_PATH, index=False)

def rewriting_summaries(data_df):
    try:
        print('Rewriting summaries')
        i = 5701
        while i < len(data_df):
            print('Summaries: Reading calculated df with num_tokens by column')
            df_calculado = read_file(DATA_PATH_CALCULADO, file_name_calculado, ",")
            df_calculado_sin_resumen = df_calculado[df_calculado['summary_generated'].isna()]
            if len(df_calculado_sin_resumen) > 0: 
                print('Summary: starting reprocess')
                df_calculado = df_calculado[i:11400]
                df_calculado = recalculate_summary(df_calculado)
                print(df_calculado)
                i = i + len(df_calculado)
                print('Summaries: ' + str(i) + 'rows calculated')
            else:
                print('Summaries: Waiting an hour until more results generated...')
                time.sleep(3600)
        print('Summaries generated')
    except Exception as e: 
        print('Summaries: Something goes wrong')
        print(e)

不要道歉;你为开源社区免费做着了不起的工作。这种情况是正常的,可以理解的。
我也在另一台具有相同规格的服务器上运行此代码,但使用的是Ollama版本0.1.39,而且我从未在这版本中遇到过问题。也许在修补该版本后出了什么问题。

9jyewag0

9jyewag03#

+1
ollama版本为0.1.39
CentOS Linux发行版7.3.1611(核心)
[root@localhost ollama]# ollama ps
名称 ID 大小 处理器 直到
qwen32b-translate:latest 65c8909c7eb0 22 GB 100% GPU 53分钟前
num_ctx:10240
num_predict: -1

vdzxcuhz

vdzxcuhz4#

@leo985 你是说你也遇到了同样的问题吗?

相关问题