我以前的问题发布在这里:
hadoop:java.lang.exception:java.lang.runtimeexception:配置对象时出错
然后我按照建议将所有jar文件打包成一个文件,第一个问题就解决了。请参考上一篇文章的源代码。提前谢谢。但新的问题是:
14/04/03 13:47:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/04/03 13:47:40 WARN snappy.LoadSnappy: Snappy native library is available
14/04/03 13:47:40 INFO snappy.LoadSnappy: Snappy native library loaded
14/04/03 13:47:40 INFO mapred.FileInputFormat: Total input paths to process : 1
14/04/03 13:47:40 INFO mapred.JobClient: Running job: job_local1748858601_0001
14/04/03 13:47:40 INFO mapred.LocalJobRunner: Waiting for map tasks
14/04/03 13:47:40 INFO mapred.LocalJobRunner: Starting task: attempt_local1748858601_0001_m_000000_0
14/04/03 13:47:40 INFO util.ProcessTree: setsid exited with exit code 0
14/04/03 13:47:40 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@c943d1
14/04/03 13:47:40 INFO mapred.MapTask: Processing split: file:/usr/local/hadoop/project/input1/url.txt:0+68
14/04/03 13:47:40 INFO mapred.MapTask: numReduceTasks: 1
14/04/03 13:47:40 INFO mapred.MapTask: io.sort.mb = 100
14/04/03 13:47:40 INFO mapred.MapTask: data buffer = 79691776/99614720
14/04/03 13:47:40 INFO mapred.MapTask: record buffer = 262144/327680
Prepare to get into webpage
14/04/03 13:47:41 INFO mapred.JobClient: map 0% reduce 0%
14/04/03 13:47:43 INFO mapred.LocalJobRunner: Map task executor complete.
14/04/03 13:47:43 WARN mapred.LocalJobRunner: job_local1748858601_0001
java.lang.Exception: java.lang.NoClassDefFoundError: org/apache/xerces/parsers/AbstractSAXParser
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NoClassDefFoundError: org/apache/xerces/parsers/AbstractSAXParser
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at de.l3s.boilerpipe.sax.BoilerpipeSAXInput.getTextDocument(BoilerpipeSAXInput.java:51)
at de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:69)
at de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:87)
at webPageToTxt.WebPageToTxt.webPageString(WebPageToTxt.java:82)
at webPageToTxt.WebPageToTxt.multiWebPageString(WebPageToTxt.java:126)
at webPageToTxt.WebPageToTxt.webPageToTxt(WebPageToTxt.java:40)
at webPageToTxt.WebPageToTxtMapper.map(WebPageToTxtMapper.java:27)
at webPageToTxt.WebPageToTxtMapper.map(WebPageToTxtMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.lang.ClassNotFoundException: org.apache.xerces.parsers.AbstractSAXParser
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
... 29 more
14/04/03 13:47:44 INFO mapred.JobClient: Job complete: job_local1748858601_0001
14/04/03 13:47:44 INFO mapred.JobClient: Counters: 0
14/04/03 13:47:44 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at webPageToTxt.ConfMain.run(ConfMain.java:33)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at webPageToTxt.ConfMain.main(ConfMain.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
1条答案
按热度按时间c86crjj01#
您需要在driver&map-reduce代码所在的jar之外添加您正在使用的所有jar,以便它们在运行时可供Map程序使用。
我浏览了你提供的链接。尽管将其他类打包为map reduce jar的一部分是可行的。这并不总是可能的。正如您看到的,这里使用的是xerces,您需要为其包含xerces-impl.jar。
更好的方法是将这些jar添加到distributedcache。
DistributedCache.addArchiveToClassPath(new Path("HDFS Path"), job);
你可以把jar放在hdfs里。所以解决方法是添加xerces jar。