为什么我的Map类在Hadoop中不获取任何记录?

3ks5zfa0  于 2022-11-28  发布在  Hadoop
关注(0)|答案(1)|浏览(214)

我试图获取AirLineData.csv文件,以确定航班数量在不同的年份在一个特定的机场。并为该Map输出文件没有显示任何记录,即使它是采取Map输入记录为10000。
这是我的Map器类函数

  1. public static class MapClass extends Mapper<LongWritable,Text,IntWritable,Text>
  2. {
  3. public void map(LongWritable key, Text value, Context context)
  4. {
  5. try{
  6. String[] str = value.toString().split(",");
  7. //String dummy_column = str[0]; //value
  8. int int_year = Integer.parseInt(str[1]);//key
  9. context.write(new IntWritable(int_year),new Text(str[0])); //key and vlaue
  10. }
  11. catch(Exception e)
  12. {
  13. System.out.println(e.getMessage());
  14. }
  15. }
  16. }

这是我的驱动程序类方法:

  1. public static void main(String[] args) throws Exception {
  2. Configuration conf = new Configuration();
  3. //conf.set("name", "value")
  4. //conf.set("mapreduce.input.fileinputformat.split.minsize", "134217728");
  5. Job job = Job.getInstance(conf, "Frequency count of flight");
  6. job.setJarByClass(FlightFrequency.class);
  7. job.setMapperClass(MapClass.class);
  8. //job.setCombinerClass(ReduceClass.class);
  9. job.setReducerClass(ReduceClass.class);
  10. job.setNumReduceTasks(1);
  11. job.setOutputKeyClass(Text.class);
  12. job.setOutputValueClass(LongWritable.class);
  13. FileInputFormat.addInputPath(job, new Path(args[0]));
  14. FileOutputFormat.setOutputPath(job, new Path(args[1]));
  15. System.exit(job.waitForCompletion(true) ? 0 : 1);
  16. }

这是我的研究结果:

  1. [bigcdac43211@ip-10-1-1-204 ~]$ hadoop jar myjar.jar AirData training/AirLineData.csv training/out8
  2. WARNING: Use "yarn jar" to launch YARN applications.
  3. 22/11/24 08:16:58 INFO client.RMProxy: Connecting to ResourceManager at ip-10-1-1-204.ap-south-1.compute.internal/10.1.1.204:8032
  4. 22/11/24 08:16:58 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execu
  5. te your application with ToolRunner to remedy this.
  6. 22/11/24 08:16:58 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/bigcdac43211/.staging/job_1663041244711_12176
  7. 22/11/24 08:16:59 INFO input.FileInputFormat: Total input files to process : 1
  8. 22/11/24 08:16:59 INFO mapreduce.JobSubmitter: number of splits:1
  9. 22/11/24 08:16:59 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.syste
  10. m-metrics-publisher.enabled
  11. 22/11/24 08:16:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1663041244711_12176
  12. 22/11/24 08:16:59 INFO mapreduce.JobSubmitter: Executing with tokens: []
  13. 22/11/24 08:16:59 INFO conf.Configuration: resource-types.xml not found
  14. 22/11/24 08:16:59 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
  15. 22/11/24 08:16:59 INFO impl.YarnClientImpl: Submitted application application_1663041244711_12176
  16. 22/11/24 08:16:59 INFO mapreduce.Job: The url to track the job: http://ip-10-1-1-204.ap-south-1.compute.internal:6066/proxy/application_166304
  17. 1244711_12176/
  18. 22/11/24 08:16:59 INFO mapreduce.Job: Running job: job_1663041244711_12176
  19. 22/11/24 08:17:06 INFO mapreduce.Job: Job job_1663041244711_12176 running in uber mode : false
  20. 22/11/24 08:17:06 INFO mapreduce.Job: map 0% reduce 0%
  21. 22/11/24 08:17:13 INFO mapreduce.Job: map 100% reduce 0%
  22. 22/11/24 08:17:21 INFO mapreduce.Job: map 100% reduce 100%
  23. 22/11/24 08:17:21 INFO mapreduce.Job: Job job_1663041244711_12176 completed successfully
  24. 22/11/24 08:17:21 INFO mapreduce.Job: Counters: 54
  25. File System Counters
  26. FILE: Number of bytes read=20
  27. FILE: Number of bytes written=445167
  28. FILE: Number of read operations=0
  29. FILE: Number of large read operations=0
  30. FILE: Number of write operations=0
  31. HDFS: Number of bytes read=10585174
  32. HDFS: Number of bytes written=0
  33. HDFS: Number of read operations=8
  34. HDFS: Number of large read operations=0
  35. HDFS: Number of write operations=2
  36. HDFS: Number of bytes read erasure-coded=0
  37. Job Counters
  38. Launched map tasks=1
  39. Launched reduce tasks=1
  40. Rack-local map tasks=1
  41. Total time spent by all maps in occupied slots (ms)=4998
  42. Total time spent by all reduces in occupied slots (ms)=3389
  43. Total time spent by all map tasks (ms)=4998
  44. Total time spent by all reduce tasks (ms)=3389
  45. Total vcore-milliseconds taken by all map tasks=4998
  46. Total vcore-milliseconds taken by all reduce tasks=3389
  47. Total megabyte-milliseconds taken by all map tasks=5117952
  48. Total megabyte-milliseconds taken by all reduce tasks=3470336
  49. Map-Reduce Framework
  50. Map input records=100000
  51. Map output records=0
  52. Map output bytes=0
  53. Map output materialized bytes=16
  54. Input split bytes=127
  55. Combine input records=0
  56. Combine output records=0
  57. Reduce input groups=0
  58. Reduce shuffle bytes=16
  59. Reduce input records=0
  60. Reduce output records=0
  61. Spilled Records=0
  62. Shuffled Maps =1
  63. Failed Shuffles=0
  64. Merged Map outputs=1
  65. GC time elapsed (ms)=175
  66. CPU time spent (ms)=3850
  67. Physical memory (bytes) snapshot=771170304
  68. Virtual memory (bytes) snapshot=5179764736
  69. Total committed heap usage (bytes)=883425280
  70. Peak Map Physical memory (bytes)=581619712
  71. Peak Map Virtual memory (bytes)=2589802496
  72. Peak Reduce Physical memory (bytes)=189550592
  73. Peak Reduce Virtual memory (bytes)=2589962240
  74. Shuffle Errors
  75. BAD_ID=0
  76. CONNECTION=0
  77. IO_ERROR=0
  78. WRONG_LENGTH=0
  79. WRONG_MAP=0
  80. WRONG_REDUCE=0
  81. File Input Format Counters
  82. Bytes Read=10585047
  83. File Output Format Counters
  84. Bytes Written=0

如您所见,我正在进行输入,但我的Map输出记录为0:

  1. Map-Reduce Framework
  2. Map input records=100000
  3. Map output records=0

这是我的示例数据(这里只显示了几行),前两列是ID、Year:

  1. ARY04F1,2004,1,12,1,623,630,901,915,UA,462,N805UA,98,105,80,-14,-7,ORD,CLT,599,7,11,0,,0,0,0,0,0,0
  2. ARY06F48889,2006,1,17,2,1453,1500,1557,1608,US,2176,N752UW,64,68,38,-11,-7,DCA,LGA,214,3,23,0,,0,0,0,0,0,0
  3. ARY08F85465,2008,1,4,5,2037,2015,2144,2120,WN,3743,N276WN,127,125,109,24,22,SLC,OAK,588,8,10,0,,0,0,0,12,0,12
wqnecbli

wqnecbli1#

如果出现异常,则永远不会调用context.write行。
打开The url to track the job并查看Map器日志,检查是否确实打印了异常行(注意:您应该使用Slf4j进行打印,而不是System.out)
不太清楚为什么你需要把年份解析成整数。reducer会很乐意接受一个是年份的文本键

相关问题