ollama Llama3:尽管有种子和温度,生成的输出仍然不一致,

dhxwm5r4  于 1个月前  发布在  其他
关注(0)|答案(3)|浏览(37)

问题是什么?
Llama 3的后续行动
尽管输出是确定性的,并且在固定的seed的情况下是可重现的,但是将temperature设置为0和固定的num_ctx时,生成的Llama 3的输出在第一次执行此代码和第二次执行此代码(不重启内核)之间略有不同。以下执行将与第二次执行相同:
LLMs from scratch - Evaluation with Ollama中取样的代码片段:

import urllib.request
import json

def query_model(prompt, model="llama3", url="http://localhost:11434/api/chat"):
    # Create the data payload as a dictionary
    data = {
        "model": model,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "options": {
            "seed": 123,
            "temperature": 0,
            "num_ctx": 2048 # must be set, otherwise slightly random output
        }
    }

    # Convert the dictionary to a JSON formatted string and encode it to bytes
    payload = json.dumps(data).encode("utf-8")

    # Create a request object, setting the method to POST and adding necessary headers
    request = urllib.request.Request(url, data=payload, method="POST")
    request.add_header("Content-Type", "application/json")

    # Send the request and capture the response
    response_data = ""
    with urllib.request.urlopen(request) as response:
        # Read and decode the response
        while True:
            line = response.readline().decode("utf-8")
            if not line:
                break
            response_json = json.loads(line)
            response_data += response_json["message"]["content"]

    return response_data

result = query_model("What do Llamas eat?")
print(result)

第1次执行的输出(输出可能会有所不同):

Llamas are herbivores, which means they primarily feed on plant-based foods. Their diet typically consists of:

1. Grasses: Llamas love to graze on various types of grasses, including tall grasses, short grasses, and even weeds.
2. Hay: High-quality hay, such as alfalfa or timothy hay, is a staple in a llama's diet. They enjoy munching on hay as a snack or as a main meal.
3. Grains: Llamas may be fed grains like oats, barley, or corn as an occasional treat or to supplement their diet.
4. Fruits and vegetables: Fresh fruits and veggies, such as apples, carrots, and sweet potatoes, can be given as treats or added to their meals for variety.
5. Leaves and shrubs: Llamas will also eat leaves from trees and shrubs, like willow or cedar.

In the wild, llamas might eat:

* Various grasses and plants
* Leaves from trees and shrubs
* Fruits and berries
* Bark (in some cases)

Domesticated llamas, on the other hand, typically receive a diet that includes:

* Hay as their main staple
* Grains or pellets as a supplement
* Fresh fruits and veggies as treats

It's essential to provide llamas with a balanced diet that meets their nutritional needs. Consult with a veterinarian or an experienced llama breeder to determine the best feeding plan for your llama.

第2次执行到第n次执行的输出(输出应该是可重现的):

Llamas are herbivores, which means they primarily feed on plant-based foods. Their diet typically consists of:

1. Grasses: Llamas love to graze on various types of grasses, including tall grasses, short grasses, and even weeds.
2. Hay: High-quality hay, such as alfalfa or timothy hay, is a staple in a llama's diet. They enjoy munching on hay cubes or loose hay.
3. Grains: Llamas may receive grains like oats, barley, or corn as part of their diet. However, these should be given in moderation to avoid digestive issues.
4. Fruits and vegetables: Fresh fruits and veggies can be a tasty treat for llamas. Some favorites include apples, carrots, sweet potatoes, and leafy greens like kale or spinach.
5. Minerals: Llamas need access to mineral supplements, such as salt licks or loose minerals, to ensure they're getting the necessary nutrients.

In the wild, llamas might also eat:

1. Leaves: They'll munch on leaves from trees and shrubs, like willow or cedar.
2. Bark: In some cases, llamas may eat the bark of certain trees, like aspen or birch.
3. Mosses: Llamas have been known to graze on mosses and other non-woody plant material.

It's essential to provide a balanced diet for your llama, taking into account their age, size, and individual needs. Consult with a veterinarian or experienced llama breeder to determine the best feeding plan for your llama.

观察:

  • 如您所见,第一次执行的输出将是随机的,而第二次执行和所有后续执行的输出将始终一致地生成确定性。
  • 我尝试使用不同的平台(Windows,Docker使用的Ubuntu镜像),似乎在这些不同的操作系统之间生成的输出有所不同:第一个总是有点随机,但在某个平台上后面的输出是一致的。但是例如在Windows上,这段代码产生的一致确定性输出与Ubuntu不同。
  • 我尝试设置Python哈希种子,这并没有解决这个问题。

Linux,macOS,Windows,Docker,WSL2
GPU:Nvidia
CPU:AMD
Llama版本:0.1.46

qco9c6ql

qco9c6ql1#

你的版本确实包含了ead259d,所以我不确定为什么。

yuvru6vn

yuvru6vn2#

可以尝试应用这个补丁:

diff --git a/llm/server.go b/llm/server.go
index 36c0e0b5..b93b5b6c 100644
--- a/llm/server.go
+++ b/llm/server.go
@@ -734,7 +734,7 @@ func (s *llmServer) Completion(ctx context.Context, req CompletionRequest, fn fu
 		"seed":              req.Options.Seed,
 		"stop":              req.Options.Stop,
 		"image_data":        req.Images,
-		"cache_prompt":      true,
+		"cache_prompt":      false,
 	}
 
 	// Make sure the server is ready

cache_prompt 标志由提交 a64570d 设置为 true。从 https://github.com/ggerganov/llama.cpp/tree/master/examples/server#api-endpoints ,它表示:
提示:将此补全的提示作为字符串或表示令牌的字符串或数字数组提供。如果 cache_prompt 为 true,则内部会将提示与之前的补全进行比较,只评估“未见过”的后缀。
一旦应用了这个补丁,无论内核重启与否,当我使用相同的 seed 和相同的 temperature 发送相同的提示时,都可以得到完全相同的输出。例如:

$ curl http://localhost:11434/api/chat -d '{
"model": "phi3:medium",
"messages": [
{
"role": "user",
"content": "Tell me a short story about ghost. Limit it to 20 words."
}
],
"options": {
"seed": 666,
"temperature": 0.666
},
"stream": false
}'

第一个输出:

{
  "model": "phi3:medium",
  "created_at": "2024-07-18T00:09:38.40377522Z",
  "message": {
    "role": "assistant",
    "content": " A lonely ghost haunted an old mansion, seeking companionship. One day, a curious child visited; they found friendship within the walls."
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 7185272872,
  "load_duration": 3884720,
  "prompt_eval_count": 21,
  "prompt_eval_duration": 1255198000,
  "eval_count": 32,
  "eval_duration": 5884770000
}

第二个输出:

{
  "model": "phi3:medium",
  "created_at": "2024-07-18T00:09:49.3024363Z",
  "message": {
    "role": "assistant",
    "content": " A lonely ghost haunted an old mansion, seeking companionship. One day, a curious child visited; they found friendship within the walls."
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 7152606421,
  "load_duration": 4006631,
  "prompt_eval_count": 21,
  "prompt_eval_duration": 1255757000,
  "eval_count": 32,
  "eval_duration": 5892071000
}

我想这个标志应该是可配置的?

gijlo24d

gijlo24d3#

老实说,我不知道这是否解决了问题。对于输出,你使用了不同的模型、不同的提示,并且没有在不同的操作系统上进行验证。
KV缓存实际上是一个有用的功能,但它可能在不同的操作系统上以不同的方式初始化。因此,禁用它可能会解决这个问题,但不能解决KV缓存初始化的问题。
ggerganov/llama.cpp#4902
但是你实际上给我带来了一个想法,通过设置 num_keep=0 (这不会禁用它,但至少不会将令牌存储在缓存中)。
我不知道如何使用你的更改安装Ollama,无论是在Ubuntu还是在Windows上,我会在新版本的Ollama实现时测试一下。感谢你的PR!
顺便说一下,我还在llama.cpp上打开了一个PR,以使输出100%确定:
ggerganov/llama.cpp#8265
当使用 temperature=0 时,可能会使用一个小系数来防止零除。在某些情况下,这可能会稍微改变生成的输出,具体取决于使用的模型。因此,最好关闭beam search和多项式采样以获得确定性采样。
设置一个 seed 只有在使用非确定性采样(如Top-k或Top-p采样)时才有意义,以确保可重复性。这里的Ollama示例代码并不完全有意义,因为你不需要设置一个 seed ,因为像 temperature=0 一样生成的输出已经是确定性的。但是当设置一个大于0.0的温度时,你需要设置一个种子以使输出可重复。

相关问题