CoreNLP 你的真案例模型有多大?

tv6aics1  于 2个月前  发布在  其他
关注(0)|答案(8)|浏览(54)

嘿!

我在一个包含100万句德语句子的数据集上训练了一个真正的大小写模型。生成的模型相当大(80MB),我在将其纳入我的标注流程时遇到了内存问题。但是当我使用英语真正的大小写模型时,没有问题。

你的真正大小写模型有多大(edu/stanford/nlp/models/truecase/truecasing.fast.caseless.qn.ser.gz)?
标注器如何影响管道的内存消耗?

kr98yfug

kr98yfug1#

好的,我解压了你的jar文件并发现了:
truecasing.fast.caseless.qn.ser.gz - 15.8 MB
但是这是怎么可能的呢?你在450万个句子上进行训练,而我只在100万个句子上进行训练。
我的配置完全相同,除了:

useQN=false (I have true)
l1reg=1.0 (I don't have this line)

因为我曾经在哪里看到过,我只能使用QN Minimizer,当我使用这个配置时,它会抛出一个错误。
这是因为德语可能有更多的单词吗(不确定是否真的如此)?

fwzugrvs

fwzugrvs2#

这是我的整个训练配置

serializeTo=truecasing.fast.caseless.qn.ser.gz
trainFileList=data.train
testFile=data.test

map=word=0,answer=1

wordFunction = edu.stanford.nlp.process.LowercaseFunction

useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useLongSequences=true
useSequences=true
usePrevSequences=true
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
useOccurrencePatterns=true
useLastRealWord=true
useNextRealWord=true
useDisjunctive=true
disjunctionWidth=5
wordShape=chris2useLC
usePosition=true
useBeginSent=true
useTitle=true

useObservedSequencesOnly=true
saveFeatureIndexToDisk=true
normalize=true

useQN=true
QNSize=25

maxLeft=1

readerAndWriter=edu.stanford.nlp.sequences.TrueCasingForNISTDocumentReaderAndWriter
featureFactory=edu.stanford.nlp.ie.NERFeatureFactory

featureDiffThresh=0.02
crcmnpdw

crcmnpdw3#

如果将l1reg加回去,会发生什么?这应该会将权重强制设为0,从而减小最终模型的大小。此外,我最近在150万个句子上重新训练了模型,得到的模型明显更大,达到了4800万。

2020年1月20日星期一,6:46 AM,erksch ***@***.***>写道:这是我的整个训练配置:serializeTo=truecasing.fast.caseless.qn.ser.gz trainFileList=data.train testFile=data.test map=word=0,answer=1 wordFunction = edu.stanford.nlp.process.LowercaseFunction useClassFeature=true useWord=true useNGrams=true noMidNGrams=true maxNGramLeng=6 usePrev=true useNext=true useLongSequences=true useSequences=true usePrevSequences=true useTypeSeqs=true useTypeSeqs2=true useTypeySequences=true useOccurrencePatterns=true useLastRealWord=true useNextRealWord=true useDisjunctive=true disjunctionWidth=5 wordShape=chris2useLC usePosition=true useBeginSent=true useTitle=true useObservedSequencesOnly=true saveFeatureIndexToDisk=true normalize=true useQN=true QNSize=25 maxLeft=1 readerAndWriter=edu.stanford.nlp.sequences.TrueCasingForNISTDocumentReaderAndWriter featureFactory=edu.stanford.nlp.ie.NERFeatureFactory featureDiffThresh=0.02 —您收到此邮件是因为您订阅了此线程。直接回复此邮件,查看GitHub <#986?email_source=notifications&email_token=AA2AYWP2HMDF63S4VSGNG6TQ6W2LBA5CNFSM4KJEKAWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJM3VKQ#issuecomment-576305834>,或者取消订阅 < https://github.com/notifications/unsubscribe-auth/AA2AYWLQAJWP7ALMX6NO74TQ6W2LBANCNFSM4KJEKAWA > 。

ohfgkhjo

ohfgkhjo4#

当我添加l1reg时,我得到了以下错误:

Exception in thread "main" edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: Error creating edu.stanford.nlp.optimization.OWLQNMinimizer
	at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:38)
	at edu.stanford.nlp.ie.crf.CRFClassifier.getMinimizer(CRFClassifier.java:2003)
	at edu.stanford.nlp.ie.crf.CRFClassifier.trainWeights(CRFClassifier.java:1902)
	at edu.stanford.nlp.ie.crf.CRFClassifier.train(CRFClassifier.java:1742)
	at edu.stanford.nlp.ie.AbstractSequenceClassifier.train(AbstractSequenceClassifier.java:785)
	at edu.stanford.nlp.ie.AbstractSequenceClassifier.train(AbstractSequenceClassifier.java:756)
	at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3011)
Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException: java.lang.ClassNotFoundException: edu.stanford.nlp.optimization.OWLQNMinimizer
	at edu.stanford.nlp.util.MetaClass.createFactory(MetaClass.java:364)
	at edu.stanford.nlp.util.MetaClass.createInstance(MetaClass.java:381)
	at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:36)
	... 6 more
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.optimization.OWLQNMinimizer
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:315)
	at edu.stanford.nlp.util.MetaClass$ClassFactory.construct(MetaClass.java:135)
	at edu.stanford.nlp.util.MetaClass$ClassFactory.<init>(MetaClass.java:202)
	at edu.stanford.nlp.util.MetaClass$ClassFactory.<init>(MetaClass.java:69)
	at edu.stanford.nl

根据我所读到的内容,这尝试使用OWLQNMinimizer,但它在CoreNLP中并未公开可用,因此找不到该类。
是的,我们没有许可发布那个优化器。[...] 添加标志useQN=true
事实证明,您还需要关闭l1reg(删除l1reg=...标志)才能使用qn实现。
就我所知,关闭正则化可能会使分类器变得更差,遗憾的是。
来自this thread

0lvr5msh

0lvr5msh5#

Ah, that's a good point. I'll check with our PI to see if things have changed in terms of what we can publicly release.…
On Mon, Jan 20, 2020 at 9:42 AM erksch ***@***.***> wrote: When I add l1reg I get the following error: Exception in thread "main" edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: Error creating edu.stanford.nlp.optimization.OWLQNMinimizer at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:38) at edu.stanford.nlp.ie.crf.CRFClassifier.getMinimizer(CRFClassifier.java:2003) at edu.stanford.nlp.ie.crf.CRFClassifier.trainWeights(CRFClassifier.java:1902) at edu.stanford.nlp.ie.crf.CRFClassifier.train(CRFClassifier.java:1742) at edu.stanford.nlp.ie.AbstractSequenceClassifier.train(AbstractSequenceClassifier.java:785) at edu.stanford.nlp.ie.AbstractSequenceClassifier.train(AbstractSequenceClassifier.java:756) at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3011) Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException: java.lang.ClassNotFoundException: edu.stanford.nlp.optimization.OWLQNMinimizer at edu.stanford.nlp.util.MetaClass.createFactory(MetaClass.java:364) at edu.stanford.nlp.util.MetaClass.createInstance(MetaClass.java:381) at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:36) ... 6 more Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.optimization.OWLQNMinimizer at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:315) at edu.stanford.nlp.util.MetaClass$ClassFactory.construct(MetaClass.java:135) at edu.stanford.nlp.util.MetaClass$ClassFactory.(MetaClass.java:202) at edu.stanford.nlp.util.MetaClass$ClassFactory.(MetaClass.java:69) at edu.stanford.nl According to the things I read this tries to use the OWLQNMinimizer which is not publicly available in CoreNLP thus the class is not found. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#986?email_source=notifications&email_token=AA2AYWLLVSFKHKLNCB6NK6TQ6XO7JA5CNFSM4KJEKAWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJNMNAI#issuecomment-576374401>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AA2AYWMV2Y6HDRJ766KKOGDQ6XO7JANCNFSM4KJEKAWA > .

db2dz4w8

db2dz4w86#

很好,非常感谢!
在最近的训练中,你是否使用了普通的QNMinimizer进行训练?

i1icjdpr

i1icjdpr7#

对于这次的晚回复,再次道歉。应该起作用的参数是:useQN=true, useOWLQN=true, priorLambda=(某个超参数)。

kh212irz

kh212irz8#

感谢您!我想在接下来的几天里重新训练我们的truecaser,并尝试一下这些参数!

相关问题