hadoop m/r实现“你可能认识的人”友情推荐

这个解决方案来自我的博客，我在我的项目中使用了这个代码。
完整版本，请参阅https://www.dbtsai.com/blog/hadoop-mr-to-implement-people-you-might-know-friendship-recommendation/
因为我不确定这是否是一个最佳的解决方案，而且我也希望在stackoverflow中有一个文档，所以我在这里提出并回答了我自己的问题。我期待社区的反馈。
最好的友谊往往来自朋友。关键的想法是，如果两个人有很多共同的朋友，但他们不是朋友，那么系统应该建议他们互相连接。假设友谊是无向的：如果a是b的朋友，那么b也是a的朋友。这是facebook、google+、linkedin和一些社交网络中最常用的友情系统。将其扩展到twitter中使用的定向交友系统并不困难；不过，我们将在这篇文章中集中讨论无向性的情况。
输入数据将包含邻接列表，并且有多行，格式为，其中是唯一用户的唯一id，是以逗号分隔的用户列表，这些用户是的朋友。下面是一个输入示例。可以理解用户和用户之间的关系在图表中更容易。

1    0,2,3,4,5
2    0,1,4
3    0,1,4
4    1,2,3
5    1,6
6    5

在图中，您可以看到用户0不是用户4和5的朋友，但是用户0和用户4有共同的朋友1、2和3；用户0和用户5有共同的朋友1。因此，我们想推荐用户4和5作为用户0的朋友。
输出将以以下格式给出。 <用户><推荐的好友对用户（#共同好友：[共同好友的id，…]），…>。输出结果按好友数排序，并可从图中验证。

0    4 (3: [3, 1, 2]),5 (1: [1])
1    6 (1: [5])
2    3 (3: [1, 4, 0]),5 (1: [1])
3    2 (3: [4, 0, 1]),5 (1: [1])
4    0 (3: [2, 3, 1]),5 (1: [1])
5    0 (1: [1]),2 (1: [1]),3 (1: [1]),4 (1: [1])
6    1 (1: [5])

现在，让我们把这个问题放到一个m/r作业中。用户0有朋友1、2和3；结果，<1，2>，<2，1>，<2，3>，<3，2>，<1，3>和<3，1>的对具有用户0的共同朋友。结果，我们可以发出<key，value>=<1，r=2；m=0>， <2，r=1；m=0>， <2，r=3；m=0>…，其中r表示推荐的朋友，m表示共同的朋友。我们可以在reduce阶段汇总结果计算用户和推荐用户之间的好友数。然而，这种方法会引起一个问题。如果用户a和推荐的用户b已经是朋友了呢？为了克服这个问题，我们可以添加另一个属性 isfriend，如果我们知道他们已经是reduce阶段的朋友，我们就不推荐他们。在下面的实现中，当他们已经是朋友时使用m=-1，而不是使用额外的字段。
在输入数据中定义fromuser是，touser是之一，然后，算法可以由
Map阶段
emit<fromuser，r=touser；m=-1>适用于所有用户。假设有n个图瑟；然后我们将发出n条记录来描述fromuser和touser已经是朋友了。请注意，它们已经是发出的键和r之间的朋友，因此我们将m设置为-1。
发射<touser1，r=touser2；m=fromuser>对于touser1和touser2的所有可能组合，他们有共同的朋友fromuser。它将发出n（n-1）条记录。
总共有n^2个发射记录在Map阶段，其中n是拥有的朋友数。
减少相位，
只是把他们之间有多少共同的朋友加起来，加上r的值。如果他们中的任何一个有共同的朋友-1，我们不做推荐，因为他们已经是朋友了。
根据共同好友的数量对结果进行排序。
因为发出的值不是在hadoop中，我们必须自定义一个新的可写类型，如下代码所示。

static public class FriendCountWritable implements Writable {
    public Long user;
    public Long mutualFriend;

    public FriendCountWritable(Long user, Long mutualFriend) {
        this.user = user;
        this.mutualFriend = mutualFriend;
    }

    public FriendCountWritable() {
        this(-1L, -1L);
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeLong(user);
        out.writeLong(mutualFriend);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        user = in.readLong();
        mutualFriend = in.readLong();
    }

    @Override
    public String toString() {
        return " toUser: "
                + Long.toString(user) + " mutualFriend: "
                + Long.toString(mutualFriend);
    }
}

Map器可以通过

public static class Map extends Mapper<LongWritable, Text, LongWritable, FriendCountWritable> {
    private Text word = new Text();

    @Override
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line[] = value.toString().split("\t");
        Long fromUser = Long.parseLong(line[0]);
        List toUsers = new ArrayList();

        if (line.length == 2) {
            StringTokenizer tokenizer = new StringTokenizer(line[1], ",");
            while (tokenizer.hasMoreTokens()) {
                Long toUser = Long.parseLong(tokenizer.nextToken());
                toUsers.add(toUser);
                context.write(new LongWritable(fromUser),
                        new FriendCountWritable(toUser, -1L));
            }

            for (int i = 0; i < toUsers.size(); i++) {
                for (int j = i + 1; j < toUsers.size(); j++) {
                    context.write(new LongWritable(toUsers.get(i)),
                            new FriendCountWritable((toUsers.get(j)), fromUser));
                    context.write(new LongWritable(toUsers.get(j)),
                            new FriendCountWritable((toUsers.get(i)), fromUser));
                }
                }
            }
        }
    }

减速器可以通过

public static class Reduce extends Reducer<LongWritable, FriendCountWritable, LongWritable, Text> {
    @Override
    public void reduce(LongWritable key, Iterable values, Context context)
            throws IOException, InterruptedException {

        // key is the recommended friend, and value is the list of mutual friends
        final java.util.Map<Long, List> mutualFriends = new HashMap<Long, List>();

        for (FriendCountWritable val : values) {
            final Boolean isAlreadyFriend = (val.mutualFriend == -1);
            final Long toUser = val.user;
            final Long mutualFriend = val.mutualFriend;

            if (mutualFriends.containsKey(toUser)) {
                if (isAlreadyFriend) {
                    mutualFriends.put(toUser, null);
                } else if (mutualFriends.get(toUser) != null) {
                    mutualFriends.get(toUser).add(mutualFriend);
                }
            } else {
                if (!isAlreadyFriend) {
                    mutualFriends.put(toUser, new ArrayList() {
                        {
                            add(mutualFriend);
                        }
                    });
                } else {
                    mutualFriends.put(toUser, null);
                }
            }
        }

        java.util.SortedMap<Long, List> sortedMutualFriends = new TreeMap<Long, List>(new Comparator() {
            @Override
            public int compare(Long key1, Long key2) {
                Integer v1 = mutualFriends.get(key1).size();
                Integer v2 = mutualFriends.get(key2).size();
                if (v1 > v2) {
                    return -1;
                } else if (v1.equals(v2) && key1 < key2) {
                    return -1;
                } else {
                    return 1;
                }
            }
        });

        for (java.util.Map.Entry<Long, List> entry : mutualFriends.entrySet()) {
            if (entry.getValue() != null) {
                sortedMutualFriends.put(entry.getKey(), entry.getValue());
            }
        }

        Integer i = 0;
        String output = "";
        for (java.util.Map.Entry<Long, List> entry : sortedMutualFriends.entrySet()) {
            if (i == 0) {
                output = entry.getKey().toString() + " (" + entry.getValue().size() + ": " + entry.getValue() + ")";
            } else {
                output += "," + entry.getKey().toString() + " (" + entry.getValue().size() + ": " + entry.getValue() + ")";
            }
            ++i;
        }
        context.write(key, new Text(output));
    }
}

其中，在树形图中使用比较器对输出值按相互友元数的降序排序。
欢迎任何意见和反馈。谢谢。

hadoop m/r实现“你可能认识的人”友情推荐

1条答案

相关问题

热门标签

最新问答