CTranslate2 支持core42/jais-13b-chat(阿拉伯LLM)

g2ieeal7  于 7个月前  发布在  其他
关注(0)|答案(4)|浏览(86)

你好,
有人能帮我解决如何使用ctranslate2运行core42/jais-13b-chat模型的问题吗?我运行了转换脚本,但遇到了错误。使用的脚本是:
ct2-transformers-converter --model core42/jais-13b-chat --quantization bfloat16 --output_dir jais-13b-ct2 --trust_remote_code
错误: ValueError: No conversion is registered for the model configuration JAISConfig (supported configurations are: BartConfig, BertConfig, BloomConfig, CodeGenConfig, DistilBertConfig, FalconConfig, GPT2Config, GPTBigCodeConfig, GPTJConfig, GPTNeoXConfig, LlamaConfig, M2M100Config, MBartConfig, MPTConfig, MT5Config, MarianConfig, MixFormerSequentialConfig, OPTConfig, PegasusConfig, RWConfig, T5Config, WhisperConfig, XLMRobertaConfig)

sbdsn5lh

sbdsn5lh1#

如果你们能调查这个问题或者至少提供指导,那将是非常大的帮助。

gudnpqoy

gudnpqoy2#

如消息所示,它不在支持的模型列表中。也许您可以尝试查看它与另一个模型的距离有多近,然后适配加载器。

sh7euo9m

sh7euo9m3#

感谢您的快速回复!

我确实编写了一个Jais的适配器(以下是代码),并成功转换了模型,但在使用ct2进行推理时,遇到了cublas_not_supported错误(这基本上表明在层之间存在错误的矩阵乘法,因此转换存在问题)。以下是适配器类的代码:

@register_loader("JAISConfig")
class JaisLoader(ModelLoader):
    @property
    def architecture_name(self):
        return "AutoModelForCausalLM"

    def get_model_spec(self, model):
        num_layers = model.config.num_hidden_layers

        num_heads = model.config.num_attention_heads
        num_heads_kv = getattr(model.config, "num_key_value_heads", num_heads)
        if num_heads_kv == num_heads:
            num_heads_kv = None

        rope_scaling = getattr(model.config, "rope_scaling", None)
        if rope_scaling:
            rotary_scaling_type = _SUPPORTED_ROPE_SCALING.get(rope_scaling["type"])
            rotary_scaling_factor = rope_scaling["factor"]

            if rotary_scaling_type is None:
                raise NotImplementedError(
                    "RoPE scaling type '%s' is not yet implemented. "
                    "The following RoPE scaling types are currently supported: %s"
                    % (rope_scaling["type"], ", ".join(_SUPPORTED_ROPE_SCALING.keys()))
                )
        else:
            rotary_scaling_type = None
            rotary_scaling_factor = 1

        spec = transformer_spec.TransformerDecoderModelSpec.from_config(
            num_layers,
            num_heads,
            activation=common_spec.Activation.SWISH,
            pre_norm=True,
            # ffn_glu=True,
            # rms_norm=True,
            alibi=True,
            alibi_use_positive_positions=True,
            scale_alibi=True,
            # rotary_dim=0,
            # rotary_interleave=False,
            # rotary_scaling_type=rotary_scaling_type,
            # rotary_scaling_factor=rotary_scaling_factor,
            # rotary_base=getattr(model.config, "rope_theta", 10000),
            num_heads_kv=num_heads_kv,
        )

        self.set_decoder(spec.decoder, model.transformer)
        self.set_linear(spec.decoder.projection, model.lm_head)
        return spec

    def get_vocabulary(self, model, tokenizer):
        tokens = super().get_vocabulary(model, tokenizer)

        extra_ids = model.config.vocab_size - len(tokens)
        for i in range(extra_ids):
            tokens.append("<extra_id_%d>" % i)

        return tokens

    def set_vocabulary(self, spec, tokens):
        spec.register_vocabulary(tokens)

    def set_config(self, config, model, tokenizer):
        config.bos_token = tokenizer.bos_token
        config.eos_token = tokenizer.eos_token
        config.unk_token = tokenizer.unk_token
        config.layer_norm_epsilon = model.config.layer_norm_epsilon

    def set_layer_norm(self, spec, layer_norm):
        spec.gamma = layer_norm.weight
        spec.beta = layer_norm.bias

    def set_position_encodings(self, spec, module):
        spec.encodings = module.slopes
        offset = getattr(module, "offset", 0)
        if offset > 0:
            spec.encodings = spec.encodings[offset:]

    def set_decoder(self, spec, module):
        spec.scale_embeddings = False
        self.set_embeddings(spec.embeddings, module.wte)

        for layer_spec, layer in zip(spec.layer, module.h):
            self.set_layer_norm(layer_spec.self_attention.layer_norm, layer.ln_1)
            self.set_linear(layer_spec.self_attention.linear[0], layer.attn.c_attn)
            self.set_linear(layer_spec.self_attention.linear[1], layer.attn.c_proj)
            
            self.set_layer_norm(layer_spec.ffn.layer_norm, layer.ln_2)

            split_layers = [common_spec.LinearSpec() for _ in range(2)]
            self.set_linear(split_layers[0], layer.mlp.c_fc)
            self.set_linear(split_layers[1], layer.mlp.c_fc2)
            utils.fuse_linear(layer_spec.ffn.linear_0, split_layers)
            self.set_linear(layer_spec.ffn.linear_1, layer.mlp.c_proj)
            delattr(layer, "attn")
            delattr(layer, "mlp")
            gc.collect()

        self.set_layer_norm(spec.layer_norm, module.ln_f)

这里是实际模型架构的参考:

JAISModel(
  (wte): Embedding(84992, 5120)
  (drop): Dropout(p=0.0, inplace=False)
  (h): ModuleList(
    (0-39): 40 x JAISBlock(
      (ln_1): LayerNorm((5120,), eps=1e-05, elementwise_affine=True)
      (attn): JAISAttention(
        (c_attn): Conv1D()
        (c_proj): Conv1D()
        (attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_dropout): Dropout(p=0.0, inplace=False)
      )
      (ln_2): LayerNorm((5120,), eps=1e-05, elementwise_affine=True)
      (mlp): JAISMLP(
        (c_fc): Conv1D()
        (c_fc2): Conv1D()
        (c_proj): Conv1D()
        (act): SwiGLUActivation()
        (dropout): Dropout(p=0.0, inplace=False)
      )
    )
  )
  (ln_f): LayerNorm((5120,), eps=1e-05, elementwise_affine=True)
  (relative_pe): AlibiPositionEmbeddingLayer()
)

请花点时间(如果可以的话)审查这段代码,并告诉我是否存在任何缺陷。提前谢谢!

owfi6suc

owfi6suc4#

正如您在第二个输出中所看到的,您有卷积层,因此您不能简单地从另一个常规GPT模型中复制粘贴加载器。请查看whisper加载器,但这并不像它看起来那样直接。

相关问题