查找网页的唯一访问者

e3bfsja2  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(308)

我想写一个Pig脚本,找到一个独特的用户识别码,视觉一个特定的网页数量。
表定义: a = (userid:chararray, otherid:chararray, webpage:chararray) 这是我写的,但行不通

a = (userid:chararray, otherid:chararray, webpage:chararray)
group_by_page = GROUP a by webpage ;
count_d = FOREACH group_by_page GENERATE group, count(distinct(a.userid));
weylhg0b

weylhg0b1#

你需要使用 DISTINCT 在嵌套的foreach中;这不是自定义项。这会让你找到你需要去的地方:

a = LOAD 'input' AS (userid:chararray, otherid:chararray, webpage:chararray);
group_by_page = GROUP a by webpage;
count_d = FOREACH group_by_page { uniq = DISTINCT a.userid; GENERATE group, COUNT(uniq); };

请访问此处了解有关嵌套foreach的更多信息。

相关问题