将现有向量转换为mahout向量

vkc1a9a2  于 2021-06-03  发布在  Hadoop
关注(0)|答案(0)|浏览(220)

我试图将术语频率值转换为mahout向量表示,以便在给定向量上使用lda。我正在关注mahout wiki,其中代码snippest建议如何将现有向量转换为mahout向量。
https://cwiki.apache.org/mahout/creating-vectors-from-text.html
这是我的代码,我在创建vectorwriter的地方得到了一个nullpointerexception。apache cwiki建议使用,

VectorWriter vectorWriter = SequenceFile.createWriter(filesystem, configuration, outfile, LongWritable.class, SparseVector.class);

但是,我在org.apache.hadoop.io.sequencefile中没有看到sequencefile.createwriter;
这是完整的代码段。

fs = FileSystem.get(conf);
        //I"m using SeqeunceFile.Writer because SequenceFile.createWriter is not available.
        VectorWriter vectorWriter = (VectorWriter) new SequenceFile.Writer(fs, conf, path, LongWritable.class, RandomAccessSparseVector.class);

        ArrayList<Vector> weights = new ArrayList<Vector>();
        BufferedReader buffer = new BufferedReader(new FileReader("/home/hadoop/LDATest/LDAData/test"));
        String line = null;

        while((line = buffer.readLine()) != null)
        {    
            String[] data = line.split(" "); // split the term,weight data
            Vector weightVector = new RandomAccessSparseVector(1,1);
            weightVector.setQuick(0, Double.parseDouble(data[1])); // add the weight
            weights.add(weightVector);
        }

        vectorWriter.write(new VectorIterable(weights));

这就是错误,
位于org.apache.hadoop.io.serializer.serializationfactory.getserializer(serializationfactory)的线程“main”java.lang.nullpointerexception中出现异常。java:73)在org.apache.hadoop.io.sequencefile$writer.init(sequencefile。java:910)在org.apache.hadoop.io.sequencefile$writer。java:843)在org.apache.hadoop.io.sequencefile$writer.(sequencefile。java:831)在org.apache.hadoop.io.sequencefile$writer。java:823)在kbsi.ideal.ldatest.iterabletest(ldatest。java:161)在kbsi.ideal.ldatest.main(ldatest。java:194)
我真的很感谢你在这方面的帮助。谢谢

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题