为什么这段代码不遍历reducer值两次?

yh2wf1be  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(248)

我有这个密码:

public void reduce(Text key, Iterable<Text> values, Context context) 
        throws IOException, InterruptedException 
        {
            String name = null; 
            String sid = null;
            String predicate = null;
            String oid = null;
            String id = null;
            String outKey = null;
            String outVal = null;

            LinkedList<Text> valuesList = new LinkedList<Text>();
            Iterator<Text> ite = values.iterator();
            while(ite.hasNext()) {
                Text t = ite.next();
                String[] entities = t.toString().split("#-#-#-#");
                        if(entities[entities.length-1].equalsIgnoreCase("topic_name"))
                {
                    name = entities[0];
                }
                valuesList.add(t);
            }
            Iterator<Text> ite2 = valuesList.iterator();
            while(ite2.hasNext()) { 
                Text t2 = ite2.next(); 
                String[] entities = t2.toString().split("#-#-#-#");
                if(!entities[entities.length-1].contains("topic_name"))
                {
                     if(name!=null) {
                     outKey = entities[0]+"\t"+entities[1]+"\t"+name;
                }
                else 
                {
                    outKey = entities[0]+"\t"+entities[1]+"\t"+key.toString();
                }
                context.write(new Text(outKey), null);
                }
            }
        }

我看到,当我再次遍历这些值时,它总是获取缓存副本中的最后一个值。

gorkyyrv

gorkyyrv1#

第一个迭代器实际上总是返回相同的结果 Text 对象,它只是在每次调用之前用不同的字符串填充它。它这样做是为了节省示例化对象的时间。所以你实际上是在建立一个 List<Text> 包含同一对象的多个副本。要解决这个问题,应该将值保存到 List<String> 包含实际的“未装箱”值。这样地:

LinkedList<String> valuesList = new LinkedList<String>();
        Iterator<Text> ite = values.iterator();
        while(ite.hasNext()) {
            Text t = ite.next();
            String[] entities = t.toString().split("#-#-#-#");
                    if(entities[entities.length-1].equalsIgnoreCase("topic_name"))
            {
                name = entities[0];
            }
            valuesList.add(t.toString());
        }
        Iterator<String> ite2 = valuesList.iterator();
        while(ite2.hasNext()) { 
            String t2 = ite2.next(); 
            String[] entities = t2.split("#-#-#-#");
            if(!entities[entities.length-1].contains("topic_name"))
            {
                 if(name!=null) {
                 outKey = entities[0]+"\t"+entities[1]+"\t"+name;
            }
            else 
            {
                outKey = entities[0]+"\t"+entities[1]+"\t"+key.toString();
            }
            context.write(new Text(outKey), null);
            }
        }

相关问题