mapreduce中的可写类

0sgqnhkj  于 2021-05-27  发布在  Hadoop
关注(0)|答案(1)|浏览(437)

如何使用hashset(docid和offset)到reduce writable的值来连接map writable和reduce writable?Map器(lineindexmapper)工作正常,但是在reducer(lineindexreducer)中,当我键入context.write(key,new indexrecordwriteable(“some string”)时,我得到了一个错误,它无法获取字符串作为参数;尽管我在reducewriteable中也有公共字符串tostring()。
我相信reducer的writeable(indexrecordwriteable.java)中的hashset可能没有正确地获取值?我有下面的代码。

IndexMapRecordWritable.java

        import java.io.DataInput;
        import java.io.DataOutput;
        import java.io.IOException;
        import org.apache.hadoop.io.LongWritable;
        import org.apache.hadoop.io.Text;
        import org.apache.hadoop.io.Writable;

        public class IndexMapRecordWritable implements Writable {

            private LongWritable offset;
            private Text docid;

            public LongWritable getOffsetWritable() {
                return offset;
            }

            public Text getDocidWritable() {
                return docid;
            }

            public long getOffset() {
                return offset.get();
            }

            public String getDocid() {
                return docid.toString();
            }

            public IndexMapRecordWritable() {
                this.offset = new LongWritable();
                this.docid = new Text();
            }

            public IndexMapRecordWritable(long offset, String docid) {
                this.offset = new LongWritable(offset);
                this.docid = new Text(docid);
            }
            public IndexMapRecordWritable(IndexMapRecordWritable indexMapRecordWritable) {
                this.offset = indexMapRecordWritable.getOffsetWritable();
                this.docid = indexMapRecordWritable.getDocidWritable();
            }
            @Override
            public String toString() {

                StringBuilder output = new StringBuilder()
                output.append(docid);
                output.append(offset);

                return output.toString();

            }

            @Override
            public void write(DataOutput out) throws IOException {

            }

            @Override
            public void readFields(DataInput in) throws IOException {

            }

        }

    IndexRecordWritable.java

        import java.io.DataInput;
    import java.io.DataOutput;
    import java.io.IOException;
    import java.util.HashSet;
    import org.apache.hadoop.io.Writable;

    public class IndexRecordWritable implements Writable {

        // Save each index record from maps
        private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();

        public IndexRecordWritable() {
        }

        public IndexRecordWritable(
                Iterable<IndexMapRecordWritable> indexMapRecordWritables) {

        }

        @Override
        public String toString() {

            StringBuilder output = new StringBuilder();

            return output.toString();

        }

        @Override
        public void write(DataOutput out) throws IOException {

        }

        @Override
        public void readFields(DataInput in) throws IOException {

        }

    }
s8vozzvw

s8vozzvw1#

好吧,这是我基于一些假设的答案。最后的输出是一个文本文件,其中包含键和文件名,这些文件名根据reducer类对pre-condition和post-condition的注解中的信息用逗号分隔。
在这种情况下,您真的不需要indexrecordwritable类。您只需使用

context.write(key, new Text(valueBuilder.substring(0, valueBuilder.length() - 1)));

类声明行为

public class LineIndexReducer extends Reducer<Text, IndexMapRecordWritable, Text, Text>

别忘了在驱动程序中设置正确的输出类。
那必须根据你班的岗位情况来达到目的。但是,如果您真的想为您的上下文编写一个text-indexrecordwritable对,有两种方法-
以字符串作为参数(基于您在indexrecordwritable类构造函数未设计为接受字符串时传递字符串的尝试),以及
以hashset作为参数(基于在indexrecordwriteable类中初始化的hashset)。
由于indexrecordwriteable类的构造函数设计为不接受字符串作为输入,因此不能传递字符串。因此,您得到的错误是不能使用字符串作为参数。ps:如果希望构造函数接受字符串,则必须在indexrecordwritable类中有另一个构造函数,如下所示:

// Save each index record from maps
    private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();

    // to save the string
    private String value;

    public IndexRecordWritable() {
    }

    public IndexRecordWritable(
            HashSet<IndexMapRecordWritable> indexMapRecordWritables) {
        /***/
    }

    // to accpet string
    public IndexRecordWritable (String value)   {
        this.value = value;
    }

但是如果你想使用hashset,那就无效了。所以,方法1不能用。你不能传递一根线。
这就剩下了第二种方法。将哈希集作为参数传递,因为您希望使用哈希集。在这种情况下,在将哈希集作为参数传递给context.write中的indexrecordwriteable之前,必须在reducer中创建哈希集。
要做到这一点,你的减速机必须像这样。

@Override
    protected void reduce(Text key, Iterable<IndexMapRecordWritable> values, Context context) throws IOException, InterruptedException {
        //StringBuilder valueBuilder = new StringBuilder();

        HashSet<IndexMapRecordWritable> set = new HashSet<>();

        for (IndexMapRecordWritable val : values) {
            set.add(val);
            //valueBuilder.append(val);
            //valueBuilder.append(",");
        }

        //write the key and the adjusted value (removing the last comma)
        //context.write(key, new IndexRecordWritable(valueBuilder.substring(0, valueBuilder.length() - 1)));
        context.write(key, new IndexRecordWritable(set));
        //valueBuilder.setLength(0);
    }

你的indexrecordwriteable.java必须有这个。

// Save each index record from maps
    private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();

// to save the string
//private String value;

public IndexRecordWritable() {
}

public IndexRecordWritable(
        HashSet<IndexMapRecordWritable> indexMapRecordWritables) {
    /***/
    tokens.addAll(indexMapRecordWritables);
}

记住,这不是根据你的减速机描述的要求。

POST-CONDITION: emit the output a single key-value where all the file names are separated by a comma ",".  <"marcello", "a.txt@3345,b.txt@344,c.txt@785">

如果仍然选择发出(text,indexrecordwriteable),请记住在indexrecordwriteable中处理哈希集以获得所需的格式。

相关问题