ludwig NLU error

vaqhlq81  于 6个月前  发布在  其他
关注(0)|答案(9)|浏览(89)

描述bug

在教程中找到的:https://ludwig.ai/latest/examples/nlu/
并使用相同的nlu数据集:
ludwig experiment --dataset nlu.csv --config config.yaml
nlu.csv
config.yaml

input_features:
    -
        name: utterance
        type: text
        encoder: 
            type: rnn
            cell_type: lstm
            bidirectional: true
            num_layers: 2
            reduce_output: null
        preprocessing:
            tokenizer: space

output_features:
    -
        name: intent
        type: category
        reduce_input: sum
        decoder:
            num_fc_layers: 1
            output_size: 64
    -
        name: slots
        type: sequence
        decoder: 
            type: tagger

error :

.
.
.
File "/home/martin/.virtualenvs/ludwig/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/martin/.virtualenvs/ludwig/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x512 and 256x256)
Training:   0%|
mefy6pfw

mefy6pfw1#

我们正在观察。

kmynzznz

kmynzznz2#

你好,@martinb-bb

感谢你提出这个问题!
形状不匹配表明有些模块没有正确初始化。我不认为这是一个复杂的修复问题,我们应该用一个更大的示例替换这个示例。
下个季度,我们计划将所有示例和教程与作为我们的CI一部分运行的实际测试相匹配,这应该有助于保持示例的完整性。
我计划在下周更仔细地查看这个问题。

pokxtpni

pokxtpni3#

非常感谢@justinxzhao!我会继续关注更新的。

gdrx4gfi

gdrx4gfi4#

抱歉给大家带来不便,@martinb-bb。

我能够在Ludwig v0.6.4上复现这个问题,但在v0.7上似乎已经修复了,训练工作正常进行。

您是否能够从头开始为您的实验安装ludwig?

inb24sb2

inb24sb25#

@justinxzhao Works like a charm! Thanks for the help. 😄
One last question:
How could I get this NLU model to train and infer using pure python API not CLI? To intergrate this library into our terminal, I need to be able to do everything with pure python.
(I would look through the docs, but there is nothing on that NLU page about that. Thanks!)

kgqe7b3p

kgqe7b3p6#

你好,@martinb-bb,这里有一个你可以使用的Python脚本示例:

import logging

import yaml

from ludwig.api import LudwigModel

config = yaml.safe_load("""
input_features:
-
name: utterance
type: text
encoder: 
type: rnn
cell_type: lstm
bidirectional: true
num_layers: 2
reduce_output: null
preprocessing:
tokenizer: space

output_features:
-
name: intent
type: category
reduce_input: sum
decoder:
num_fc_layers: 1
output_size: 64
-
name: slots
type: sequence
decoder:
type: tagger

""")

# Define Ludwig model object that drive model training
model = LudwigModel(config=config, logging_level=logging.INFO)

# initiate model training
(
    train_stats,  # dictionary containing training statistics
    preprocessed_data,  # tuple Ludwig Dataset objects of pre-processed training data
    output_directory,  # location of training results stored on disk
) = model.train(dataset="nlu.csv", experiment_name="simple_experiment", model_name="simple_model")
a6b3iqyw

a6b3iqyw7#

@justinxzhao 太好了!谢谢你🙏😊
我想问最后一个问题:
一旦模型训练完成,用于推理目的时,你会如何使用它?
因为你给它一个字符串作为输入,并期望收到意图+槽位的回复,有什么特别的吗?
这应该回答了我所有的问题!
提前感谢你😊

sg24os4d

sg24os4d8#

你好@martinb-bb,
根据你的需求,你可以尝试以下几种部署选项:

  • ludwig serve
  • 导出到torchscript

在这两种情况下,用户都应该准备好提供一个字符串作为输入,并接收回意图+槽位。
根据你的质量要求,使用两个单一任务模型而不是一个多任务模型进行性能基准测试可能是值得的。

9w11ddsr

9w11ddsr9#

@justinxzhao Thanks for your input. All is working now!
I have one last clarification to make about my model output.
Input:

raw_data = {
    'utterance': ['forecast the future closing value of msft in 10 days'],
}
preprocessed_data = preprocessor(raw_data)
predictions = predictor(preprocessed_data)
postprocessed_data = postprocessor(predictions)

Output:

{'intent::predictions': ['forecast'], 'intent::probabilities': tensor([[7.5126e-10, 1.5106e-37, 1.0000e+00, 3.2936e-35]], device='cuda:0',
       grad_fn=<SoftmaxBackward0>), 'slots::predictions': [['<SOS>', 'command', 'O', 'forecast_day-b', 'forecast_day-b', 'forecast_day-e', 'forecast_target', 'O', 'O', 'forecast_dataset', '<EOS>', '<EOS>']], 'slots::probabilities': tensor([[0.9961, 0.9844, 0.9780, 0.6928, 0.5926, 0.8219, 0.9529, 0.9871, 0.9707,
         0.9674, 0.9905, 0.9996]],
[nlu.csv](https://github.com/ludwig-ai/ludwig/files/10279643/nlu.csv)
 device='cuda:0', grad_fn=<MaxBackward0>), 'slots::probability': tensor([-1.2624], device='cuda:0', grad_fn=<SumBackward1>)}

What is confusing me is that the [slots] has more values than the input? The intent works great but my slots does not match up 1:1. Input = 10 words , output=array of 12 items with a double EOS statement.

Utterance: forecast the future closing value of msft in 10 days
Slots: ['<SOS>', 'command', 'O', 'forecast_day-b', 'forecast_day-b', 'forecast_day-e', 'forecast_target', 'O', 'O', 'forecast_dataset', '<EOS>', '<EOS>']

Ignoring all slot classifications, is there a reason why this happens? Perhaps I am overlooking something. (i have attached the dataset in case you want to check it out.)
Very much appreciate your support!
nlu.csv

相关问题