hadoop字数计算示例-空指针异常

tzdcorbm  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(343)

我是hadoop初学者。我的设置:rhel7,hadoop-2.7.3
我正在尝试运行以下示例:\u wordcount\u v2.0。我刚刚将源代码复制到新的eclipse项目中,并将其导出到wc.jar文件中。
现在,我已经按照链接中的说明配置了hadoop伪分布式操作。然后我从以下几点开始:
正在输入目录中创建输入文件:

  1. echo "Hello World, Bye World!" > input/file01
  2. echo "Hello Hadoop, Goodbye to hadoop." > input/file02

启动环境:

  1. sbin/start-dfs.sh
  2. bin/hdfs dfs -mkdir /user
  3. bin/hdfs dfs -mkdir /user/<username>
  4. bin/hdfs dfs -put input input
  5. bin/hadoop jar ws.jar WordCount2 input output

这就是我得到的:

  1. 16/09/02 13:15:01 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
  2. 16/09/02 13:15:01 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
  3. 16/09/02 13:15:01 INFO input.FileInputFormat: Total input paths to process : 2
  4. 16/09/02 13:15:01 INFO mapreduce.JobSubmitter: number of splits:2
  5. 16/09/02 13:15:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local455553963_0001
  6. 16/09/02 13:15:01 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
  7. 16/09/02 13:15:01 INFO mapreduce.Job: Running job: job_local455553963_0001
  8. 16/09/02 13:15:01 INFO mapred.LocalJobRunner: OutputCommitter set in config null
  9. 16/09/02 13:15:01 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
  10. 16/09/02 13:15:01 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
  11. 16/09/02 13:15:02 INFO mapred.LocalJobRunner: Waiting for map tasks
  12. 16/09/02 13:15:02 INFO mapred.LocalJobRunner: Starting task: attempt_local455553963_0001_m_000000_0
  13. 16/09/02 13:15:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
  14. 16/09/02 13:15:02 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
  15. 16/09/02 13:15:02 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/aii/input/file02:0+33
  16. 16/09/02 13:15:02 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
  17. 16/09/02 13:15:02 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
  18. 16/09/02 13:15:02 INFO mapred.MapTask: soft limit at 83886080
  19. 16/09/02 13:15:02 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
  20. 16/09/02 13:15:02 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
  21. 16/09/02 13:15:02 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  22. 16/09/02 13:15:02 INFO mapred.MapTask: Starting flush of map output
  23. 16/09/02 13:15:02 INFO mapred.LocalJobRunner: Starting task: attempt_local455553963_0001_m_000001_0
  24. 16/09/02 13:15:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
  25. 16/09/02 13:15:02 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
  26. 16/09/02 13:15:02 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/aii/input/file01:0+24
  27. 16/09/02 13:15:02 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
  28. 16/09/02 13:15:02 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
  29. 16/09/02 13:15:02 INFO mapred.MapTask: soft limit at 83886080
  30. 16/09/02 13:15:02 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
  31. 16/09/02 13:15:02 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
  32. 16/09/02 13:15:02 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  33. 16/09/02 13:15:02 INFO mapred.MapTask: Starting flush of map output
  34. 16/09/02 13:15:02 INFO mapred.LocalJobRunner: map task executor complete.
  35. 16/09/02 13:15:02 WARN mapred.LocalJobRunner: job_local455553963_0001
  36. java.lang.Exception: java.lang.NullPointerException
  37. at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
  38. at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
  39. Caused by: java.lang.NullPointerException
  40. at WordCount2$TokenizerMapper.setup(WordCount2.java:47)
  41. at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
  42. at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
  43. at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
  44. at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
  45. at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  46. at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  47. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  48. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  49. at java.lang.Thread.run(Thread.java:745)
  50. 16/09/02 13:15:02 INFO mapreduce.Job: Job job_local455553963_0001 running in uber mode : false
  51. 16/09/02 13:15:02 INFO mapreduce.Job: map 0% reduce 0%
  52. 16/09/02 13:15:02 INFO mapreduce.Job: Job job_local455553963_0001 failed with state FAILED due to: NA
  53. 16/09/02 13:15:02 INFO mapreduce.Job: Counters: 0

未给出结果(输出)。为什么我会得到那个例外?
谢谢
编辑:
感谢建议的解决方案,我意识到还有第二次尝试(在wordcount示例中):

  1. echo "\." > patterns.txt
  2. echo "\," >> patterns.txt
  3. echo "\!" >> patterns.txt
  4. echo "to" >> patterns.txt

然后运行:

  1. bin/hadoop jar ws.jar WordCount2 -Dwordcount.case.sensitive=true input output -skip patterns.txt

一切都太好了!

r8uurelv

r8uurelv1#

问题可能出在代码的这一部分:

  1. caseSensitive = conf.getBoolean("wordcount.case.sensitive", true);
  2. if (conf.getBoolean("wordcount.skip.patterns", true)) {
  3. URI[] patternsURIs = Job.getInstance(conf).getCacheFiles();
  4. for (URI patternsURI : patternsURIs) {
  5. Path patternsPath = new Path(patternsURI.getPath());
  6. String patternsFileName = patternsPath.getName().toString();
  7. parseSkipFile(patternsFileName);
  8. }
  9. }

在这里 getCacheFiles() 他回来了 null 不管什么原因。这就是为什么当你试图迭代 patternsURIs (除了 null ),你得到了例外。
要解决此问题,请在启动循环之前检查 patternsURIs 是否为空。

  1. if(patternsURIs != null) {
  2. for (URI patternsURI : patternsURIs) {
  3. Path patternsPath = new Path(patternsURI.getPath());
  4. String patternsFileName = patternsPath.getName().toString();
  5. parseSkipFile(patternsFileName);
  6. }
  7. }

你还应该检查一下为什么 null ,如果不希望 null .

展开查看全部
xa9qqrwz

xa9qqrwz2#

问题发生在 setup() Map器的方法。这个wordcount示例比通常的例子要高级一些,它允许您指定一个包含Map器将过滤掉的模式的文件。此文件将添加到 main() 方法,使其在每个节点上都可供Map程序打开。
您可以在中看到正在添加到缓存的文件 main() :

  1. for (int i=0; i < remainingArgs.length; ++i) {
  2. if ("-skip".equals(remainingArgs[i])) {
  3. job.addCacheFile(new Path(remainingArgs[++i]).toUri());
  4. job.getConfiguration().setBoolean("wordcount.skip.patterns", true);
  5. } else {
  6. otherArgs.add(remainingArgs[i]);
  7. }
  8. }

您没有指定 -skip 选项,这样它就不会尝试添加任何内容。如果添加了一个文件,您可以看到该文件集 wordcount.skip.patternstrue .
在Map上 setup() 您有以下代码:

  1. @Override
  2. public void setup(Context context) throws IOException, InterruptedException {
  3. conf = context.getConfiguration();
  4. caseSensitive = conf.getBoolean("wordcount.case.sensitive", true);
  5. if (conf.getBoolean("wordcount.skip.patterns", true)) {
  6. URI[] patternsURIs = Job.getInstance(conf).getCacheFiles();
  7. for (URI patternsURI : patternsURIs) {
  8. Path patternsPath = new Path(patternsURI.getPath());
  9. String patternsFileName = patternsPath.getName().toString();
  10. parseSkipFile(patternsFileName);
  11. }
  12. }
  13. }

问题是这张支票 conf.getBoolean("wordcount.skip.patterns", true) 默认为 true 如果它没有被设定,在你的情况下它不会被设定。因此 patternsURIs 或者周围的东西(我没有行号)将是空的。
所以你要么改变 wordcount.case.sensitive 默认为 false ,设置为 false 在驱动程序中(main方法)或提供一个跳过文件来修复它。

展开查看全部

相关问题