java—使用ApacheTika和ClouderaHadoop提取文档内容

dojqjjoe  于 2021-05-30  发布在  Hadoop
关注(0)|答案(0)|浏览(174)

我尝试使用ApacheTikaJar1.6从文档中提取内容,并使用cdh4.6运行mapreduce作业。我使用了下面链接中的代码
https://groups.google.com/forum/#!主题/chennaihug/waobslv0 ae
但是当我运行代码时,会抛出以下错误

14/11/12 17:14:55 INFO mapred.JobClient: Task Id : attempt_201411121354_0007_m_000000_1, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.tika.exception.TikaException
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at TikaFileInputFormat.createRecordReader(TikaFileInputFormat.java:15)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:644)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)

有人能建议如何解决这个问题吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题