reduce中的hadoop-sql建模

wz1wpwve 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(235)

我正在尝试对sql查询建模，比如select distinct（col1）from table where col2=value2 in map reduce。我使用的逻辑是，每个Map器将检查where子句，如果找到匹配项，它将发出where子句值作为键，col1作为值。基于默认的散列函数，所有的输出都将和where子句中的key used值放在同一个reducer中。在reducer中，我可以排除重复并发出不同的值。这是正确的方法吗？
这是实现这一目标的正确方法吗？
注意：此查询的数据在csv文件中。

hadoop hdfs mapreduce bigdata

来源：https://stackoverflow.com/questions/42081906/sql-modeling-in-map-reduce

1条答案

按热度按时间

30byixjq1#

//MAPPER pseudo code
public static class DistinctMapper extends  Mapper<Object, Text, Text, NullWritable> {
        private Text col1 = new Text();
        private Text col2 = new Text();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

            // Logic to extract columns
            String C1  = extractColumn(value);
            String C2  = extractColumn(value);

            if (C2 != 'WhereCluaseValue') {  // filter value
                return;
            }
            // Mapper output key to the distinct column value
            col1.set(C1);
            // Mapper value as NULL
            context.write(col1, NullWritable.get());
        }
    }

//REDUCER pseudo code
public static class DistinctReducer extends Reducer<Text, NullWritable, Text, NullWritable> {
        public void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
            // distinct column with a null value
            //Here we are not concerned about the list of values
            context.write(key, NullWritable.get());
        }
}

赞(0）回复(0）举报 2021-06-02

我来回答

reduce中的hadoop-sql建模

1条答案

相关问题

热门标签

最新问答