我正试图在mahout中对数据进行聚类。显示错误。这是错误
java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.mahout.clustering.classify.ClusterClassificationMapper.populateClusterModels(ClusterClassificationMapper.java:129)
at org.apache.mahout.clustering.classify.ClusterClassificationMapper.setup(ClusterClassificationMapper.java:74)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
13/03/07 19:29:31 INFO mapred.JobClient: map 0% reduce 0%
13/03/07 19:29:31 INFO mapred.JobClient: Job complete: job_local_0010
13/03/07 19:29:31 INFO mapred.JobClient: Counters: 0
java.lang.InterruptedException: Cluster Classification Driver Job failed processing E:/Thesis/Experiments/Mahout dataset/input
at org.apache.mahout.clustering.classify.ClusterClassificationDriver.classifyClusterMR(ClusterClassificationDriver.java:276)
at org.apache.mahout.clustering.classify.ClusterClassificationDriver.run(ClusterClassificationDriver.java:135)
at org.apache.mahout.clustering.kmeans.KMeansDriver.clusterData(KMeansDriver.java:260)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:152)
at com.ifm.dataclustering.SequencePrep.<init>(SequencePrep.java:95)
at com.ifm.dataclustering.App.main(App.java:8)
这是我的密码
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path vector_path = new Path("E:/Thesis/Experiments/Mahout dataset/input/vector_input");
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, vector_path, Text.class, VectorWritable.class);
VectorWritable vec = new VectorWritable();
for (NamedVector outputVec : vector) {
vec.set(outputVec);
writer.append(new Text(outputVec.getName()), vec);
}
writer.close();
// create initial cluster
Path cluster_path = new Path("E:/Thesis/Experiments/Mahout dataset/clusters/part-00000");
SequenceFile.Writer cluster_writer = new SequenceFile.Writer(fs, conf, cluster_path, Text.class, Kluster.class);
// number of cluster k
int k=4;
for(i=0;i<k;i++) {
NamedVector outputVec = vector.get(i);
Kluster cluster = new Kluster(outputVec, i, new EuclideanDistanceMeasure());
// System.out.println(cluster);
cluster_writer.append(new Text(cluster.getIdentifier()), cluster);
}
cluster_writer.close();
// set cluster output path
Path output = new Path("E:/Thesis/Experiments/Mahout dataset/output");
HadoopUtil.delete(conf, output);
KMeansDriver.run(conf, new Path("E:/Thesis/Experiments/Mahout dataset/input"), new Path("E:/Thesis/Experiments/Mahout dataset/clusters"),
output, new EuclideanDistanceMeasure(), 0.001, 10,
true, 0.0, false);
SequenceFile.Reader output_reader = new SequenceFile.Reader(fs,new Path("E:/Thesis/Experiments/Mahout dataset/output/" + Kluster.CLUSTERED_POINTS_DIR+ "/part-m-00000"), conf);
IntWritable key = new IntWritable();
WeightedVectorWritable value = new WeightedVectorWritable();
while (output_reader.next(key, value)) {
System.out.println(value.toString() + " belongs to cluster "
+ key.toString());
}
reader.close();
}
1条答案
按热度按时间vyu0f0g11#
输入/输出数据的路径似乎不正确。mapreduce作业在群集上运行。因此,数据是从hdfs读取的,而不是从本地硬盘读取的。
错误消息:
给你一个关于错误路径的提示。
在运行作业之前,请确保首先将输入数据上载到hdfs:
而不是:
您应该使用hdfs路径:
或
编辑:
使用filesystem#exists(path)检查
Path
是否有效。