Is your feature request related to a problem? Please describe.
My dataset is something like the following - just for example:
ID SrcTag TgtTag SrcNoTag TgtNoTag
okpCtx:tu=54:s=0 <g1>(e) Institutional support</g1> (e) 机构支持 (e) Institutional support (e) 机构支持
okpCtx:tu=55:s=0 <g1>(f) Agreeing on the language </g1><x2/><g2>of the mediation</g2> (f) 就调解语言达成一致 (f) Agreeing on the language of the mediation (f) 就调解语言达成一致
okpCtx:tu=56:s=0 <g1>1. Commencement of the mediation</g1> 1. 调解的启动 1. Commencement of the mediation 1. 调解的启动
okpCtx:tu=57:s=0 <g1>1. Commencement of the mediation<x2/>5</g1> <g1>1. 调解的启动<x2/>5</g1> 1. Commencement of the mediation5 1. 调解的启动5
okpCtx:tu=58:s=0 <g1>1. Commencement of the mediation<x2/>6</g1> <g1>1. 调解的启动<x2/>6</g1> 1. Commencement of the mediation6 1. 调解的启动6
okpCtx:tu=59:s=0 <g1>2. Selection and appointment of a mediator</g1> 2. 调解员的选择和指定 2. Selection and appointment of a mediator 2. 调解员的选择和指定
The SrcTag-TgtTag pairs and SrcNoTag-TgtNoTag pairs are parallel sentences which are translated from English into Chinese. The former have HTML-like tags, while the latter don't have such tags.
Describe the use case
I have been trying to train on the input features of SrcNoTag, TgtNoTag and SrcTag for the output feature of TgtTag, with a config file like the following:
input_features:
-
name: SrcTag
type: sequence
encoder:
type: rnn
cell_type: lstm
reduce_output: null
preprocessing:
tokenizer: english_tokenize
name: SrcNoTag
type: sequence
encoder:
type: rnn
cell_type: lstm
reduce_output: null
preprocessing:
tokenizer: english_tokenize
name: TgtNoTag
type: sequence
encoder:
type: rnn
cell_type: lstm
reduce_output: null
preprocessing:
tokenizer: chinese_tokenize
output_features:
-
name: TgtTag
type: sequence
decoder:
type: generator
cell_type: lstm
attention: bahdanau
reduce_input: null
#loss:
#type: softmax_cross_entropy
preprocessing:
tokenizer: chinese_tokenize
training:
batch_size: 8
Describe the solution you'd like
The above trained sucessfully, but does not really give any meaningful result.
Describe alternatives you've considered
I have considered a "combiner", but do not know how.
Additional context
As you may guess, the model is supposed to re-construct the tags that are in the source English sentences in the tag-free target Chinese sentences.
I have tried the very first record on You Chat - https://you.com/search?q=best+laptops&fromSearchBar=true&tbm=youchat , and got a very promising result:
(Though it wrongly repeated the result as its subject.)
2条答案
按热度按时间mefy6pfw1#
这是一段有趣的聊天内容:
eimct9ow2#
我们来看看这个!