我正在尝试在hadoop2.6.0上运行开源knn-join-mapreduce-hbrj算法,用于安装在笔记本电脑(osx)上的单节点集群伪分布式操作。这是密码。
Map器、减速器和主驱动器:
public class RPhase2 extends Configured implements Tool
{
public static class MapClass extends MapReduceBase
implements Mapper<LongWritable, Text, IntWritable, RPhase2Value>
{
public void map(LongWritable key, Text value,
OutputCollector<IntWritable, RPhase2Value> output,
Reporter reporter) throws IOException
{
String line = value.toString();
String[] parts = line.split(" +");
// key format <rid1>
IntWritable mapKey = new IntWritable(Integer.valueOf(parts[0]));
// value format <rid2, dist>
RPhase2Value np2v = new RPhase2Value(Integer.valueOf(parts[1]), Float.valueOf(parts[2]));
System.out.println("############### key: " + mapKey.toString() + " np2v: " + np2v.toString());
output.collect(mapKey, np2v);
}
}
public static class Reduce extends MapReduceBase
implements Reducer<IntWritable, RPhase2Value, NullWritable, Text>
{
int numberOfPartition;
int knn;
class Record {...}
class RecordComparator implements Comparator<Record> {...}
public void configure(JobConf job)
{
numberOfPartition = job.getInt("numberOfPartition", 2);
knn = job.getInt("knn", 3);
System.out.println("########## configuring!");
}
public void reduce(IntWritable key, Iterator<RPhase2Value> values,
OutputCollector<NullWritable, Text> output,
Reporter reporter) throws IOException
{
//initialize the pq
RecordComparator rc = new RecordComparator();
PriorityQueue<Record> pq = new PriorityQueue<Record>(knn + 1, rc);
System.out.println("Phase 2 is at reduce");
System.out.println("########## key: " + key.toString());
// For each record we have a reduce task
// value format <rid1, rid2, dist>
while (values.hasNext())
{
RPhase2Value np2v = values.next();
int id2 = np2v.getFirst().get();
float dist = np2v.getSecond().get();
Record record = new Record(id2, dist);
pq.add(record);
if (pq.size() > knn)
pq.poll();
}
while(pq.size() > 0)
{
output.collect(NullWritable.get(), new Text(key.toString() + " " + pq.poll().toString()));
//break; // only ouput the first record
}
} // reduce
} // Reducer
public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), RPhase2.class);
conf.setJobName("RPhase2");
conf.setMapOutputKeyClass(IntWritable.class);
conf.setMapOutputValueClass(RPhase2Value.class);
conf.setOutputKeyClass(NullWritable.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(MapClass.class);
conf.setReducerClass(Reduce.class);
int numberOfPartition = 0;
List<String> other_args = new ArrayList<String>();
for(int i = 0; i < args.length; ++i)
{
try {
if ("-m".equals(args[i])) {
//conf.setNumMapTasks(Integer.parseInt(args[++i]));
++i;
} else if ("-r".equals(args[i])) {
conf.setNumReduceTasks(Integer.parseInt(args[++i]));
} else if ("-p".equals(args[i])) {
numberOfPartition = Integer.parseInt(args[++i]);
conf.setInt("numberOfPartition", numberOfPartition);
} else if ("-k".equals(args[i])) {
int knn = Integer.parseInt(args[++i]);
conf.setInt("knn", knn);
System.out.println(knn + "~ hi");
} else {
other_args.add(args[i]);
}
conf.setNumReduceTasks(numberOfPartition * numberOfPartition);
//conf.setNumReduceTasks(1);
} catch (NumberFormatException except) {
System.out.println("ERROR: Integer expected instead of " + args[i]);
return printUsage();
} catch (ArrayIndexOutOfBoundsException except) {
System.out.println("ERROR: Required parameter missing from " + args[i-1]);
return printUsage();
}
}
FileInputFormat.setInputPaths(conf, other_args.get(0));
FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new RPhase2(), args);
}
} // RPhase2
当我运行这个程序时,Map程序成功了,但是作业突然终止,并且reducer从未示例化。此外,不会打印错误(即使在日志文件中)。我知道这也是因为减速机配置中的print语句永远不会被打印出来。输出:
15/06/15 14:00:37 INFO mapred.LocalJobRunner: map task executor complete.
15/06/15 14:00:38 INFO mapreduce.Job: map 100% reduce 0%
15/06/15 14:00:38 INFO mapreduce.Job: Job job_local833125918_0001 completed successfully
15/06/15 14:00:38 INFO mapreduce.Job: Counters: 20
File System Counters
FILE: Number of bytes read=12505456
FILE: Number of bytes written=14977422
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=11408
HDFS: Number of bytes written=8724
HDFS: Number of read operations=216
HDFS: Number of large read operations=0
HDFS: Number of write operations=99
Map-Reduce Framework
Map input records=60
Map output records=60
Input split bytes=963
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=14
Total committed heap usage (bytes)=1717567488
File Input Format Counters
Bytes Read=2153
File Output Format Counters
Bytes Written=1645
到目前为止我所做的:
我一直在研究类似的问题,我发现最常见的问题是当Map器和reducer的输出不同时没有配置输出格式,这在上面的代码中完成:conf.setmapoutputkeyclass(class);conf.setmapoutputvalueclass(类);
在另一篇文章中,我发现了一个建议,将reduce(…,iterator<…>,…)改为(…,iterable<…>,…),这给我的编译带来了麻烦。我无法再使用.getnext()和.next()方法,并出现以下错误:
错误:reduce不是抽象的,并且不重写reducer中的抽象方法reduce(intwriteable、iterator、outputcollector、reporter)
如果有人对我能找到的问题有任何提示或建议,我将非常感激!
只是一个注意,我已经张贴了一个关于我的问题之前在这里(hadoop knn连接算法停留在Map100%减少0%),但它没有得到足够的重视,所以我想重新从一个不同的Angular 问这个问题。您可以使用此链接了解有关我的日志文件的更多详细信息。
1条答案
按热度按时间f8rj6qna1#
我已经解决了这个问题,这是愚蠢的事情。如果您注意到在上面的代码中,numberofpartition在读取参数之前被设置为0,并且reducer的数量被设置为numberofpartition*numberofpartition。i、 因为用户没有更改partitions参数的数量(主要是因为我只是从他们提供的readme中复制粘贴了参数行),所以这就是为什么reducer从未启动。