mapreduce中的可写类

0sgqnhkj 于 2021-05-27 发布在 Hadoop

关注(0)|答案(1)|浏览(437)

如何使用hashset（docid和offset）到reduce writable的值来连接map writable和reduce writable？Map器（lineindexmapper）工作正常，但是在reducer（lineindexreducer）中，当我键入context.write（key，new indexrecordwriteable（“some string”）时，我得到了一个错误，它无法获取字符串作为参数；尽管我在reducewriteable中也有公共字符串tostring（）。
我相信reducer的writeable（indexrecordwriteable.java）中的hashset可能没有正确地获取值？我有下面的代码。

IndexMapRecordWritable.java

        import java.io.DataInput;
        import java.io.DataOutput;
        import java.io.IOException;
        import org.apache.hadoop.io.LongWritable;
        import org.apache.hadoop.io.Text;
        import org.apache.hadoop.io.Writable;

        public class IndexMapRecordWritable implements Writable {

            private LongWritable offset;
            private Text docid;

            public LongWritable getOffsetWritable() {
                return offset;
            }

            public Text getDocidWritable() {
                return docid;
            }

            public long getOffset() {
                return offset.get();
            }

            public String getDocid() {
                return docid.toString();
            }

            public IndexMapRecordWritable() {
                this.offset = new LongWritable();
                this.docid = new Text();
            }

            public IndexMapRecordWritable(long offset, String docid) {
                this.offset = new LongWritable(offset);
                this.docid = new Text(docid);
            }
            public IndexMapRecordWritable(IndexMapRecordWritable indexMapRecordWritable) {
                this.offset = indexMapRecordWritable.getOffsetWritable();
                this.docid = indexMapRecordWritable.getDocidWritable();
            }
            @Override
            public String toString() {

                StringBuilder output = new StringBuilder()
                output.append(docid);
                output.append(offset);

                return output.toString();

            }

            @Override
            public void write(DataOutput out) throws IOException {

            }

            @Override
            public void readFields(DataInput in) throws IOException {

            }

        }

    IndexRecordWritable.java

        import java.io.DataInput;
    import java.io.DataOutput;
    import java.io.IOException;
    import java.util.HashSet;
    import org.apache.hadoop.io.Writable;

    public class IndexRecordWritable implements Writable {

        // Save each index record from maps
        private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();

        public IndexRecordWritable() {
        }

        public IndexRecordWritable(
                Iterable<IndexMapRecordWritable> indexMapRecordWritables) {

        }

        @Override
        public String toString() {

            StringBuilder output = new StringBuilder();

            return output.toString();

        }

        @Override
        public void write(DataOutput out) throws IOException {

        }

        @Override
        public void readFields(DataInput in) throws IOException {

        }

    }

hadoop mapreduce key-value Class writable

来源：https://stackoverflow.com/questions/64760864/writable-classes-in-mapreduce

1条答案

按热度按时间

s8vozzvw1#

好吧，这是我基于一些假设的答案。最后的输出是一个文本文件，其中包含键和文件名，这些文件名根据reducer类对pre-condition和post-condition的注解中的信息用逗号分隔。
在这种情况下，您真的不需要indexrecordwritable类。您只需使用

context.write(key, new Text(valueBuilder.substring(0, valueBuilder.length() - 1)));

类声明行为

public class LineIndexReducer extends Reducer<Text, IndexMapRecordWritable, Text, Text>

别忘了在驱动程序中设置正确的输出类。
那必须根据你班的岗位情况来达到目的。但是，如果您真的想为您的上下文编写一个text-indexrecordwritable对，有两种方法-
以字符串作为参数（基于您在indexrecordwritable类构造函数未设计为接受字符串时传递字符串的尝试），以及
以hashset作为参数（基于在indexrecordwriteable类中初始化的hashset）。
由于indexrecordwriteable类的构造函数设计为不接受字符串作为输入，因此不能传递字符串。因此，您得到的错误是不能使用字符串作为参数。ps:如果希望构造函数接受字符串，则必须在indexrecordwritable类中有另一个构造函数，如下所示：

// Save each index record from maps
    private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();

    // to save the string
    private String value;

    public IndexRecordWritable() {
    }

    public IndexRecordWritable(
            HashSet<IndexMapRecordWritable> indexMapRecordWritables) {
        /***/
    }

    // to accpet string
    public IndexRecordWritable (String value)   {
        this.value = value;
    }

但是如果你想使用hashset，那就无效了。所以，方法1不能用。你不能传递一根线。
这就剩下了第二种方法。将哈希集作为参数传递，因为您希望使用哈希集。在这种情况下，在将哈希集作为参数传递给context.write中的indexrecordwriteable之前，必须在reducer中创建哈希集。
要做到这一点，你的减速机必须像这样。

@Override
    protected void reduce(Text key, Iterable<IndexMapRecordWritable> values, Context context) throws IOException, InterruptedException {
        //StringBuilder valueBuilder = new StringBuilder();

        HashSet<IndexMapRecordWritable> set = new HashSet<>();

        for (IndexMapRecordWritable val : values) {
            set.add(val);
            //valueBuilder.append(val);
            //valueBuilder.append(",");
        }

        //write the key and the adjusted value (removing the last comma)
        //context.write(key, new IndexRecordWritable(valueBuilder.substring(0, valueBuilder.length() - 1)));
        context.write(key, new IndexRecordWritable(set));
        //valueBuilder.setLength(0);
    }

你的indexrecordwriteable.java必须有这个。

// Save each index record from maps
    private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();

// to save the string
//private String value;

public IndexRecordWritable() {
}

public IndexRecordWritable(
        HashSet<IndexMapRecordWritable> indexMapRecordWritables) {
    /***/
    tokens.addAll(indexMapRecordWritables);
}

记住，这不是根据你的减速机描述的要求。

POST-CONDITION: emit the output a single key-value where all the file names are separated by a comma ",".  <"marcello", "a.txt@3345,b.txt@344,c.txt@785">

如果仍然选择发出（text，indexrecordwriteable），请记住在indexrecordwriteable中处理哈希集以获得所需的格式。

赞(0）回复(0）举报 2021-05-27

我来回答

mapreduce中的可写类

1条答案

相关问题

热门标签

最新问答