ollama LLM编译器模型

0ejtzxu1  于 2个月前  发布在  其他
关注(0)|答案(1)|浏览(35)

如何使用meta发布的新模型与Ollama一起使用?
https://huggingface.co/collections/facebook/llm-compiler-667c5b05557fe99a9edd25cb
谢谢。

f2uvfpb9

f2uvfpb91#

下载模型。这是一个受限制的模型,所以你需要申请并获得访问权限,然后需要生成一个令牌。

huggingface-cli download --token TOKEN  --local-dir . facebook/llm-compiler-7b-ftd

转换为GGUF格式。我觉得使用docker化的llama.cpp是最简单的方法,如果你没有docker,你需要安装llama.cpp。

docker run -it -v .:/models ghcr.io/ggerganov/llama.cpp:full-cuda -c --outtype f16 /models

根据需要进行量化:

docker run -it --gpus all -v .:/models ghcr.io/ggerganov/llama.cpp:full-cuda -q /models/ggml-model-f16.gguf Q4_K_S

创建模型文件:

echo FROM ggml-model-Q4_K_S.gguf > Modelfile

创建模型:

ollama create llm-compiler:7b-ftd-Q4_K_S -f Modelfile

尝试一下。从测试文件中提取汇编示例,将其 Package 在提示符中并发送到模型:

sed -ne '/asm =/,/"""/p' llm_compiler_demo.py | grep -v '"""' > asm
curl -s http://localhost:11434/api/generate -d '{"model":"llm-compiler:7b-ftd-Q4_K_S","prompt": '"$((echo -e '[INST] Disassemble this code to LLVM-IR:\n\n<code>' ; cat asm ; echo '</code> [/INST]') | jq -sR)"',"format":"","options":{},"stream": false, "raw": true}'

响应看起来像这样:

This will produce code:

<code>; ModuleID = '<stdin>'
source_filename = "-"
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-none-linux-gnu"

; Function Attrs: minsize nofree norecurse nounwind optsize memory(inaccessiblemem: readwrite) uwtable
define dso_local i32 @add_two(i32 noundef %0, i32 noundef %1) local_unnamed_addr #0 {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  store volatile i32 %0, ptr %3, align 4, !tbaa !6
  store volatile i32 %1, ptr %4, align 4, !tbaa !6
  %.0..0..0..0.2 = load volatile i32, ptr %3, align 4, !tbaa !6
  %.0..0..0..0.1 = load volatile i32, ptr %4, align 4, !tbaa !6
  %5 = add nsw i32 %.0..0..0..0.1, %.0..0..0..0.2
  ret i32 %5
}

attributes #0 = { minsize nofree norecurse nounwind optsize memory(inaccessiblemem: readwrite) uwtable "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+fp-armv8,+neon,+outline-atomics,+v8a,-fmv" }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{i32 7, !"frame-pointer", i32 1}
!5 = !{!"clang version 17.0.6 (git@github.com:fairinternal/CodeGen.git b05db9bbf7a92019267416c1bb9996fe6134e3f1)"}
!6 = !{!7, !7, i64 0}
!7 = !{!"int", !8, i64 0}
!8 = !{!"omnipotent char", !9, i64 0}
!9 = !{!"Simple C/C++ TBAA"}
</code>

量化后的正确性待定。

相关问题