无法在mahout中示例化类型cluster、kmean clustering示例

dauxcl2d  于 2021-06-04  发布在  Hadoop
关注(0)|答案(1)|浏览(489)

嗨,我正试图在mahout中运行kmeanclustering示例,却被示例代码中的一个错误绊住了。我在下面的代码snipet中出错
cluster cluster=新簇(vec,i,new euclideandinstancemeasure());
它给出了一个错误
无法示例化类型群集
(据我所知,这是一个接口)我想在我的示例数据集上运行kmeans,有人能指导我吗。
我在eclipseide中包含了以下jar
mahout-math-0.7-cdh4.3.0.jar
hadoop-common-2.0.0-cdh4.2.1.jar
hadoop-hdfs-2.0.0-cdh4.2.1.jar
hadoop-mapreduce-client-core-2.0.0-cdh4.2.1.jar
mahout-core-0.7-cdh4.3.0.jar
如果我缺少任何必要的jar,我将在hadoopcdh4.2.1上运行它
这里附上我的全部代码,取自github

  1. package tryout;
  2. import java.io.File;
  3. import java.io.IOException;
  4. import java.util.ArrayList;
  5. import java.util.List;
  6. import org.apache.hadoop.conf.Configuration;
  7. import org.apache.hadoop.fs.FileSystem;
  8. import org.apache.hadoop.fs.Path;
  9. import org.apache.hadoop.io.IntWritable;
  10. import org.apache.hadoop.io.LongWritable;
  11. import org.apache.hadoop.io.SequenceFile;
  12. import org.apache.hadoop.io.Text;
  13. import org.apache.mahout.math.RandomAccessSparseVector;
  14. import org.apache.mahout.math.Vector;
  15. import org.apache.mahout.math.VectorWritable;
  16. import org.apache.mahout.clustering.Cluster;
  17. import org.apache.mahout.clustering.classify.WeightedVectorWritable;
  18. import org.apache.mahout.clustering.kmeans.KMeansDriver;
  19. import org.apache.mahout.common.distance.EuclideanDistanceMeasure;
  20. public class SimpleKMeansClustering {
  21. public static final double[][] points = { {1, 1}, {2, 1}, {1, 2},
  22. {2, 2}, {3, 3}, {8, 8},
  23. {9, 8}, {8, 9}, {9, 9}};
  24. public static void writePointsToFile(List<Vector> points,
  25. String fileName,FileSystem fs,Configuration conf) throws IOException {
  26. Path path = new Path(fileName);
  27. SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,path, LongWritable.class, VectorWritable.class);
  28. long recNum = 0;
  29. VectorWritable vec = new VectorWritable();
  30. for (Vector point : points) {
  31. vec.set(point);
  32. writer.append(new LongWritable(recNum++), vec);
  33. } writer.close();
  34. }
  35. public static List<Vector> getPoints(double[][] raw) {
  36. List<Vector> points = new ArrayList<Vector>();
  37. for (int i = 0; i < raw.length; i++) {
  38. double[] fr = raw[i];
  39. Vector vec = new RandomAccessSparseVector(fr.length);
  40. vec.assign(fr);
  41. points.add(vec);
  42. }
  43. return points;
  44. }
  45. public static void main(String args[]) throws Exception {
  46. int k = 2;
  47. List<Vector> vectors = getPoints(points);
  48. File testData = new File("testdata");
  49. if (!testData.exists()) {
  50. testData.mkdir();
  51. }
  52. testData = new File("testdata/points");
  53. if (!testData.exists()) {
  54. testData.mkdir();
  55. }
  56. Configuration conf = new Configuration();
  57. FileSystem fs = FileSystem.get(conf);
  58. writePointsToFile(vectors, "testdata/points/file1", fs, conf);
  59. Path path = new Path("testdata/clusters/part-00000");
  60. SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,path, Text.class, Cluster.class);
  61. for (int i = 0; i < k; i++) {
  62. Vector vec = vectors.get(i);
  63. Cluster cluster = new Cluster(vec, i, new EuclideanDistanceMeasure());
  64. writer.append(new Text(cluster.getIdentifier()), cluster);
  65. }
  66. writer.close();
  67. KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"),
  68. new Path("output"), new EuclideanDistanceMeasure(), 0.001, 10,
  69. true, false);
  70. SequenceFile.Reader reader = new SequenceFile.Reader(fs,new Path("output/" + Cluster.CLUSTERED_POINTS_DIR+ "/part-m-00000"), conf);
  71. IntWritable key = new IntWritable();
  72. WeightedVectorWritable value = new WeightedVectorWritable();
  73. while (reader.next(key, value)) {
  74. System.out.println(value.toString() + " belongs to cluster " + key.toString());
  75. }
  76. reader.close();
  77. }
  78. }

另外,如果我有自己的数据集,请指导我如何实现这一点。

yx2lnoni

yx2lnoni1#

我也一直在尝试从《马霍特在行动》一书的作品中做出这个例子。我最终成功了。以下是我所做的:

  1. SequenceFile.Writer writer= new SequenceFile.Writer(fs, conf, path, Text.class, Kluster.class);
  2. for (int i = 0; i < k; i++) {
  3. Vector vec = vectors.get(i);
  4. Kluster cluster = new Kluster(vec, i, new EuclideanDistanceMeasure());
  5. writer.append(new Text(Kluster.getIdentifier()), cluster);
  6. }

我不敢相信书中的代码是错误的。我还成功地让它在不使用maven的情况下工作。我在这里更全面地描述了这一点,但基本上我是通过用户库实现的:在eclipse中使用mahout而不使用maven
更新:好的,书的内容没有错,但是很旧。此页有指向该书中更新代码的链接
http://alexott.blogspot.co.uk/2012/07/getting-started-with-examples-from.html

相关问题