mahout-canopy集群,k-means集群:java堆空间-内存不足

hgqdbh6s  于 2021-06-04  发布在  Hadoop
关注(0)|答案(0)|浏览(287)

我在一个my集群上运行mahout 0.7,这个集群有30个节点(每个节点有8个核16g内存),试图聚集250000个sparsevector(300000个)。
如果我通过调整冠层参数(t1,t2)找到少量的冠层中心,效果会很好。
当超过一定数量的canopy中心时,作业不断失败,并在reduce阶段的67%处显示“error:java heap space”消息。
如果k值增加,k-means聚类也有同样的堆空间问题。
我听说树冠中心向量和k中心向量保存在每个Map器和减缩器的内存中。这将是canopy center(或k)x sparsevector(300000大小)的数量=足以容纳4g内存,这看起来并不太糟糕。
基于之前这里和其他地方的问题,我已经启动了我能找到的每一个记忆旋钮:
hadoop-env.sh:在namenode上将所有堆空间设置为16gb,在datanode上甚至8gb。
mapred-site.xml:添加mapred.{map,reduce}.child.java.opts属性,并将其值设置为-xmx4000m
mapred-site.xml:更改mapred.tasktracker.{map,reduce}.tasks.maximum属性,并将其值从8降低到4
问题还在持续。我在这上面撞了很久了——有人有什么建议吗?
完整的命令和输出如下所示:

public static void main(String [] args) throws Exception{

    String ratingsPath = args[0];
    String outputPath = args[1];
    String T1 = args[2];
    String T2 = args[3];

    Configuration conf = new Configuration();       

    HadoopUtil.delete(conf, new Path(outputPath));

    CanopyDriver.run(conf, new Path(ratingsPath), new Path(outputPath), new ManhattanDistanceMeasure(), 
            Double.parseDouble(T1), Double.parseDouble(T2), true, 0.0, false);

}

我面临的错误信息是:

Exception in thread "main" java.lang.InterruptedException: Canopy Job failed processing /MrBic/Output/SeedGeneration_predSample
at org.apache.mahout.clustering.canopy.CanopyDriver.buildClustersMR(CanopyDriver.java:363)
at org.apache.mahout.clustering.canopy.CanopyDriver.buildClusters(CanopyDriver.java:248)
at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:155)
at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:170)
at MrBicClusteringDriver.main(MrBicClusteringDriver.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

2013-06-12 10:56:00,825 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:434)
at org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
at org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:139)
at org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:560)
at org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:275)
at org.apache.mahout.clustering.canopy.Canopy.<init>(Canopy.java:43)
at org.apache.mahout.clustering.canopy.CanopyClusterer.addPointToCanopies(CanopyClusterer.java:163)
at org.apache.mahout.clustering.canopy.CanopyReducer.reduce(CanopyReducer.java:47)
at org.apache.mahout.clustering.canopy.CanopyReducer.reduce(CanopyReducer.java:30)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题