stanfordcorenlp的piglatin jodatime错误

kuuvgm7e  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(318)

我正在尝试创建一个pig udf,它使用通过sista scala api接口的stanford corenlp包提取tweet中提到的位置。使用“sbt run”在本地运行时工作正常,但从pig调用时抛出“java.lang.nosuchmethoderror”异常:
从tagger edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim加载默认属性。从edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim读取pos-tagger模型从edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz 2013-06-1410:47:54952[通信线程]info org.apache.hadoop.mapred.localjobrunner-reduce>reduce done[7.5秒]。正在从edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz加载分类器。。。2013-06-14 10:48:02108[低内存检测器]info org.apache.pig.impl.util.spillablemorymanager-第一个内存处理程序调用-收集阈值init=18546688(18112k)used=358671232(350264k)committed=366542848(357952k)max=699072512(682688k)done[5.0秒]。正在从edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz加载分类器。。。2013-06-14 10:48:10522[低内存检测器]info org.apache.pig.impl.util.spillablemorymanager-第一个内存处理程序调用-使用阈值init=18546688(18112k)used=590012928(576184k)committed=597786624(583776k)max=699072512(682688k)done[5.6秒]。2013-06-14 10:48:11469[thread-11]警告org.apache.hadoop.mapred.localjobrunner-job\u local\u 0001 java.lang.nosuchmethoderror:org.joda.time.duration.compareto(lorg/joda/time/readableduration;)我在edu.stanford.nlp.time.sutime$duration.compareto(sutime。java:3406)在edu.stanford.nlp.time.sutime$duration.max(sutime。java:3488)在edu.stanford.nlp.time.sutime$时间差(sutime。java:1308)在edu.stanford.nlp.time.sutime$range.(sutime。java:3793)在斯坦福大学的时候。java:570)
以下是相关代码:

  1. object CountryTokenizer {
  2. def tokenize(text: String): String = {
  3. val locations = TweetEntityExtractor.NERLocationFilter(text)
  4. println(locations)
  5. locations.map(x => Cities.country(x)).flatten.mkString(" ")
  6. }
  7. }
  8. class PigCountryTokenizer extends EvalFunc[String] {
  9. override def exec(tuple: Tuple): java.lang.String = {
  10. val text: java.lang.String = Util.cast[java.lang.String](tuple.get(0))
  11. CountryTokenizer.tokenize(text)
  12. }
  13. }
  14. object TweetEntityExtractor {
  15. val processor:Processor = new CoreNLPProcessor()
  16. def NERLocationFilter(text: String): List[String] = {
  17. val doc = processor.mkDocument(text)
  18. processor.tagPartsOfSpeech(doc)
  19. processor.lemmatize(doc)
  20. processor.recognizeNamedEntities(doc)
  21. val locations = doc.sentences.map(sentence => {
  22. val entities = sentence.entities.map(List.fromArray(_)) match {
  23. case Some(l) => l
  24. case _ => List()
  25. }
  26. val words = List.fromArray(sentence.words)
  27. (words zip entities).filter(x => {
  28. x._1 != "" && x._2 == "LOCATION"
  29. }).map(_._1)
  30. })
  31. List.fromArray(locations).flatten
  32. }
  33. }

我正在使用sbt assembly构建一个fat jar,因此joda time jar文件应该是可以访问的。怎么回事?

6rqinv9w

6rqinv9w1#

pig附带了自己的joda time(1.6)版本,该版本与2.x不兼容。

相关问题