描述问题

我正在使用Ludwig的TorchVision模型训练一个图像分类器。
原始模型在最后一层有一个softmax操作符，但它们被移除了，因为它不属于编码器。然而，softmax层从未被放回解码器。这是有意为之吗？
我需要计算输出的softmax值。我可以向前做的3种方法有：

在解码器中添加softmax层
在将模型导出到Torchscript、ONNX或CoreML时添加softmax层
保持原样，在应用程序中计算softmax

这里是模型架构的调试打印语句。为了简洁起见，我已经删除了大部分内容。

ECD(
  (input_features): LudwigFeatureDict(
    (module_dict): ModuleDict(
      (image_path__ludwig): ImageInputFeature(
        (encoder_obj): TVEfficientNetEncoder(
          (model): EfficientNet(
            (features): Sequential(
              (0): Conv2dNormActivation(
                (0): Conv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
                (1): BatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
                (2): SiLU(inplace=True)
              )
              // --- removed for conciseness ---
              (7): Conv2dNormActivation(
                (0): Conv2d(256, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
                (1): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
                (2): SiLU(inplace=True)
              )
            )
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (classifier): Sequential(
              (0): Dropout(p=0.2, inplace=True)
              (1): Identity()
            )
          )
        )
      )
    )
  )
  (output_features): LudwigFeatureDict(
    (module_dict): ModuleDict(
      (label__ludwig): CategoryOutputFeature(
        (fc_stack): FCStack(
          (stack): ModuleList()
        )
        (reduce_sequence_input): SequenceReducer(
          (_reduce_obj): ReduceSum()
        )
        (decoder_obj): Classifier(
          (dense): Dense(
            (dense): Linear(in_features=1280, out_features=4, bias=True)
          )
        )
        (train_loss_function): SoftmaxCrossEntropyLoss(
          (loss_fn): CrossEntropyLoss()
        )
      )
    )
  )
  (combiner): ConcatCombiner(
    (fc_stack): FCStack(
      (stack): ModuleList()
    )
  )
)

重现问题

Python文件：

import logging
from ludwig.api import LudwigModel
CONFIG = "/auto-ml/ludwig.yaml"
def train_classifier_ludwig(df, save_dir, model_name):
    model = LudwigModel(CONFIG, logging_level=logging.INFO)
    model.train(
        dataset=df,
        output_directory=save_dir,
        experiment_name="ludwig",
        model_name=model_name,
        skip_save_processed_input=True,
    )

YAML文件：

trainer:
  epochs: 100
  early_stop: 10
  use_mixed_precision: false
input_features:
  - name: image_path
    type: image
    preprocessing:
      num_processes: 4
    encoder:
      type: efficientnet
      use_pretrained: True
      trainable: True
      model_cache_dir: null
      model_variant: v2_m
    fc_layers:
      - output_size: 128
        dropout: 0.4
output_features:
  - name: label
    type: category

预期行为

当对图像分类器进行推理时，输出概率之和应为1。
从具有4个类别的图像分类器获得的示例值：

[-1.0383801 -1.1289184  3.9636617 -0.988309 ]

然而，应该是：

[0.00659277 0.0060221  0.98045385 0.00693128]

环境

OS: Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.2
Python 3.10.9
Ludwig版本：来自master的最新版本，sha=890f261fa947ed9485065844fe1bd5a35460f6f4
附加信息

我不确定这是否相关，但是有一个SoftmaxCrossEntropyLoss模块，但它里面没有softmax操作符。这是有意为之吗？我是不是漏掉了什么？

3条答案

按热度按时间

i2loujxw1#

嘿，@saad-palapa 。编码器之后有一个组合器，然后是解码器。解码器负责添加一个最后的投影层(如果它是一个类别解码器),或者为产生预测做任何需要做的事情。
@jimthompson5802 也给我打上标签。

赞(0）回复(0）举报 10个月前

bsxbgnwa2#

我没有在debug输出中看到softmax。请查看yaml文件，是否有错误？

7cjasjjr3#

Softmax实际上应用于预测模块：_CategoryPredict。
https://github.com/ludwig-ai/ludwig/blob/master/ludwig/features/category_feature.py#L100-L135
这本身并不是ECD模型的一部分，你是对的。
原因是在训练时不需要softmax,因为损失函数会应用它，而在预测时，这个模型会被用来决定是否使用校准。

ludwig Softmax missing from Torchvision models

3条答案

相关问题

热门标签

最新问答