我试图获取AirLineData.csv文件,以确定航班数量在不同的年份在一个特定的机场。并为该Map输出文件没有显示任何记录,即使它是采取Map输入记录为10000。
这是我的Map器类函数
public static class MapClass extends Mapper<LongWritable,Text,IntWritable,Text>
{
public void map(LongWritable key, Text value, Context context)
{
try{
String[] str = value.toString().split(",");
//String dummy_column = str[0]; //value
int int_year = Integer.parseInt(str[1]);//key
context.write(new IntWritable(int_year),new Text(str[0])); //key and vlaue
}
catch(Exception e)
{
System.out.println(e.getMessage());
}
}
}
这是我的驱动程序类方法:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
//conf.set("name", "value")
//conf.set("mapreduce.input.fileinputformat.split.minsize", "134217728");
Job job = Job.getInstance(conf, "Frequency count of flight");
job.setJarByClass(FlightFrequency.class);
job.setMapperClass(MapClass.class);
//job.setCombinerClass(ReduceClass.class);
job.setReducerClass(ReduceClass.class);
job.setNumReduceTasks(1);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
这是我的研究结果:
[bigcdac43211@ip-10-1-1-204 ~]$ hadoop jar myjar.jar AirData training/AirLineData.csv training/out8
WARNING: Use "yarn jar" to launch YARN applications.
22/11/24 08:16:58 INFO client.RMProxy: Connecting to ResourceManager at ip-10-1-1-204.ap-south-1.compute.internal/10.1.1.204:8032
22/11/24 08:16:58 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execu
te your application with ToolRunner to remedy this.
22/11/24 08:16:58 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/bigcdac43211/.staging/job_1663041244711_12176
22/11/24 08:16:59 INFO input.FileInputFormat: Total input files to process : 1
22/11/24 08:16:59 INFO mapreduce.JobSubmitter: number of splits:1
22/11/24 08:16:59 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.syste
m-metrics-publisher.enabled
22/11/24 08:16:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1663041244711_12176
22/11/24 08:16:59 INFO mapreduce.JobSubmitter: Executing with tokens: []
22/11/24 08:16:59 INFO conf.Configuration: resource-types.xml not found
22/11/24 08:16:59 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
22/11/24 08:16:59 INFO impl.YarnClientImpl: Submitted application application_1663041244711_12176
22/11/24 08:16:59 INFO mapreduce.Job: The url to track the job: http://ip-10-1-1-204.ap-south-1.compute.internal:6066/proxy/application_166304
1244711_12176/
22/11/24 08:16:59 INFO mapreduce.Job: Running job: job_1663041244711_12176
22/11/24 08:17:06 INFO mapreduce.Job: Job job_1663041244711_12176 running in uber mode : false
22/11/24 08:17:06 INFO mapreduce.Job: map 0% reduce 0%
22/11/24 08:17:13 INFO mapreduce.Job: map 100% reduce 0%
22/11/24 08:17:21 INFO mapreduce.Job: map 100% reduce 100%
22/11/24 08:17:21 INFO mapreduce.Job: Job job_1663041244711_12176 completed successfully
22/11/24 08:17:21 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=20
FILE: Number of bytes written=445167
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=10585174
HDFS: Number of bytes written=0
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4998
Total time spent by all reduces in occupied slots (ms)=3389
Total time spent by all map tasks (ms)=4998
Total time spent by all reduce tasks (ms)=3389
Total vcore-milliseconds taken by all map tasks=4998
Total vcore-milliseconds taken by all reduce tasks=3389
Total megabyte-milliseconds taken by all map tasks=5117952
Total megabyte-milliseconds taken by all reduce tasks=3470336
Map-Reduce Framework
Map input records=100000
Map output records=0
Map output bytes=0
Map output materialized bytes=16
Input split bytes=127
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=16
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=175
CPU time spent (ms)=3850
Physical memory (bytes) snapshot=771170304
Virtual memory (bytes) snapshot=5179764736
Total committed heap usage (bytes)=883425280
Peak Map Physical memory (bytes)=581619712
Peak Map Virtual memory (bytes)=2589802496
Peak Reduce Physical memory (bytes)=189550592
Peak Reduce Virtual memory (bytes)=2589962240
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=10585047
File Output Format Counters
Bytes Written=0
如您所见,我正在进行输入,但我的Map输出记录为0:
Map-Reduce Framework
Map input records=100000
Map output records=0
这是我的示例数据(这里只显示了几行),前两列是ID、Year:
ARY04F1,2004,1,12,1,623,630,901,915,UA,462,N805UA,98,105,80,-14,-7,ORD,CLT,599,7,11,0,,0,0,0,0,0,0
ARY06F48889,2006,1,17,2,1453,1500,1557,1608,US,2176,N752UW,64,68,38,-11,-7,DCA,LGA,214,3,23,0,,0,0,0,0,0,0
ARY08F85465,2008,1,4,5,2037,2015,2144,2120,WN,3743,N276WN,127,125,109,24,22,SLC,OAK,588,8,10,0,,0,0,0,12,0,12
1条答案
按热度按时间wqnecbli1#
如果出现异常,则永远不会调用
context.write
行。打开
The url to track the job
并查看Map器日志,检查是否确实打印了异常行(注意:您应该使用Slf4j进行打印,而不是System.out)不太清楚为什么你需要把年份解析成整数。reducer会很乐意接受一个是年份的文本键