无法解释的java hashmap行为

dxxyhpgq 于 2021-06-04 发布在 Hadoop

关注(0)|答案(2)|浏览(302)

在下面的代码中，我创建了一个hashmap来存储名为datums的对象，其中包含一个字符串（位置）和一个计数。不幸的是，代码给出了非常奇怪的行为。

FileSystem fs = FileSystem.get(new Configuration());
            Random r = new Random();
            FSDataOutputStream fsdos = fs.create(new Path("error/" + r.nextInt(1000000)));

            HashMap<String, Datum> datums = new HashMap<String, Datum>();
            while (itrtr.hasNext()) {
                Datum next = itrtr.next();
                synchronized (datums) {
                    if (!datums.containsKey(next.location)) {
                        fsdos.writeUTF("INSERTING: " + next + "\n");
                        datums.put(next.location, next);
                    } else {
                    } // skit those that are already indexed 
                }
            }
            for (Datum d : datums.values()) {
                fsdos.writeUTF("PRINT DATUM VALUES: " + d.toString() + "\n");
            }

hashmap以字符串作为键。
下面是我在错误文件中得到的输出（示例）：

INSERTING: (test.txt,3)

INSERTING: (test2.txt,1)

PRINT DATUM VALUES: (test.txt,3)

PRINT DATUM VALUES: (test.txt,3)

The correct output for the print should be:
INSERTING: (test.txt,3)

INSERTING: (test2.txt,1)

PRINT DATUM VALUES: (test.txt,3)

PRINT DATUM VALUES: (test2.txt,1)

test2.txt作为其位置的数据发生了什么变化？为什么要用test.txt替换它？？
基本上，我永远不会看到同一个地点两次(这就是我要说的！datums.containskey正在检查）。不幸的是，我的行为很奇怪。
顺便说一下，这是在hadoop上，在一个reducer中。
我试着把synchronized放在这里，以防它在多个线程中运行，据我所知，它不是。不过，同样的事情还是发生了。

Java hadoop

来源：https://stackoverflow.com/questions/20618678/unexplained-java-hashmap-behavior

2条答案

按热度按时间

44u64gxh1#

这不是Map的问题，而是代码基准的问题；作为值引用插入，稍后更改：）这就是为什么Map中的所有值都与Map中最后处理的数据相同的原因

赞(0）回复(0）举报 2021-06-04

ccgok5k52#

根据这个答案，hadoop的迭代器总是返回同一个对象，而不是每次在循环中创建一个新的对象来返回。
因此，保留对迭代器返回的对象的引用是无效的，并且会产生令人惊讶的结果。您需要将数据复制到新对象：

while (itrtr.hasNext()) {
            Datum next = itrtr.next();
            // copy any values from the Datum to a fresh instance
            Datum insert = new Datum(next.location, next.value);
            if (!datums.containsKey(insert.location)) {
                datums.put(insert.location, insert);
            }
        }

以下是hadoop reducer文档的参考，证实了这一点：
框架将重用传递到reduce的key和value对象，因此应用程序应该克隆要保留副本的对象。

赞(0）回复(0）举报 2021-06-04

我来回答

无法解释的java hashmap行为

2条答案

相关问题

热门标签

最新问答