ollama 如何禁止同时使用GPU和CPU?

pwuypxnk 于 6个月前发布在其他

关注(0)|答案(4)|浏览(74)

当同时使用GPU和CPU时，输出会出现乱码，因此我想禁止这种情况。

ollama

来源：https://github.com/ollama/ollama/issues/4971

4条答案

按热度按时间

a7qyws3x1#

你好，@xiaohanglei,这是什么型号？这种情况肯定不应该发生——对此表示抱歉。

赞(0）回复(0）举报 6个月前

xienkqul2#

你好，测试模型基于一个名为qwen:1.8b的模型。我在这个基础上修改了一些参数值。具体如下：

FROM qwen:1.8b
TEMPLATE "{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant"
SYSTEM You are a helpful assistant.
PARAMETER top_p 0.7
PARAMETER num_ctx 4096
PARAMETER repeat_last_n -1
PARAMETER repeat_penalty 1.05
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER temperature 0.3
PARAMETER top_k 20

使用该模型的结果如下：

当问题发生时，Ollama的输出日志如下：

文件：
ollama_output.txt

测试结论：

通过测试，可以确定这个问题与num_ctx参数值有关。当将其设置为2048时，问题不会出现。然而，当将其设置为4096时，问题极有可能再次出现。

测试环境：

操作系统：Windows 10
GPU:NVIDIA GeForce GTX 1050 Ti
CPU:Intel Core i5-12490F
Ollama版本：0.1.41

测试场景：

使用测试工具将GPU内存负载提高到超过95%,以便在加载模型时，它可以在CPU和GPU之间分配。下面是一个示例图像：

下面是我用于测试目的增加GPU内存负载的代码。仅供参考：

import torch
import time

# 检查是否有可用的GPU
if torch.cuda.is_available():
    # 获取默认的GPU设备
    device = torch.device('cuda')
    print(f'Using GPU: {torch.cuda.get_device_name(device)}')

    # 获取GPU的总显存容量
    total_memory = torch.cuda.get_device_properties(device).total_memory
    print(f'Total GPU memory: {total_memory / (1024 ** 3):.2f} GB')

    # 计算需要分配的元素数量以占用约85%的显存
    target_memory = int(total_memory * 4.3)
    num_elements = target_memory // 4

    # 创建一个大张量并将其分配到GPU上
    tensor = torch.zeros(num_elements, dtype=torch.float32, device=device)
    print(f'Allocated {target_memory / (1024 ** 3):.2f} GB of GPU memory, which is 85% of the total GPU memory.')

    # 创建两个较小的张量用于矩阵乘法运算
    size = 1024  # 你可以根据需要调整这个大小
    tensor_a = torch.randn(size, size, device=device)
    tensor_b = torch.randn(size, size, device=device)

    # 保持程序运行状态，并进行大量计算以提高GPU利用率
    try:
        while True:
            # 进行大量矩阵乘法运算以提高GPU利用率
            for _ in range(800):  # 调整循环次数以控制计算负载
                result = torch.matmul(tensor_a, tensor_b)
                #time.sleep(0.001)
            #time.sleep(0.01)  # 增加休眠时间，模拟实际计算任务的间隔
    except KeyboardInterrupt:
        print('Program terminated by user.')
else:
    print('No GPU available.')

赞(0）回复(0）举报 6个月前

m2xkgtsf3#

我怀疑这个问题与#4977类似。

赞(0）回复(0）举报 6个月前

wfveoks04#

你好，@xiaohanglei ,这是什么型号？这种情况不应该发生-对此感到抱歉
@jmorganca ,已提供测试场景

赞(0）回复(0）举报 6个月前