如何在Pig的袋子和元组交叉?

cwxwcias  于 2021-06-21  发布在  Pig
关注(0)|答案(0)|浏览(261)

我有一个这样的包(url:chararray mal:float)像这样(url:chararray links:字符)。我想解析links字段并将包与解析的链接相交:

  1. src = LOAD 'hbase://$collection' USING
  2. org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:url anchors:links', '-loadKey true') AS (id:bytearray, url:chararray, links:chararray);
  3. mals = LOAD '/tmp/prepare' as (url:chararray, mal:float);
  4. urls = FILTER src BY (links IS NOT null);
  5. urls2 = FOREACH urls GENERATE TOKENIZE(links, '\t') as links, id, url;
  6. processed = FOREACH urls2 {
  7. grouped = COGROUP links BY $0, mals BY url;
  8. intersected = FILTER grouped BY NOT IsEmpty(urls) AND NOT IsEmpty(links4);
  9. weights = FOREACH intersected GENERATE mal;
  10. GENERATE id, AVG(weights) as mal;
  11. };

此代码不起作用:分析器失败,原因是:

  1. [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <file ./Rank.pig, line 11, column 19> [query, statement, foreach_statement, foreach_complex_statement, foreach_clause_complex, foreach_plan_complex, nested_blk, nested_command_list, nested_command, expr, add_expr, multi_expr, cast_expr, unary_expr, expr_eval, var_expr, projectable_expr, func_eval, recoverFromMismatchedToken] mismatched input 'links' expecting LEFT_PAREN

我用的是Pig0.11.0。
据我所知,链接是元组,而mals是bag,所以它们不能同时分组。我如何创建一个包的链接做cogroup?
upd:示例数据集:

  1. /tmp/prepare:
  2. http://1 1.0
  3. http://2 0.9
  4. http://3 0.8
  5. http://4 0.0
  6. HBase:
  7. id: ID
  8. url: http://4
  9. links: http://1 http://2 http://3

作为输出:

  1. {(id: ID, mal: 0.9)}

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题