我有一个这样的包(url:chararray mal:float)像这样(url:chararray links:字符)。我想解析links字段并将包与解析的链接相交:
src = LOAD 'hbase://$collection' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:url anchors:links', '-loadKey true') AS (id:bytearray, url:chararray, links:chararray);
mals = LOAD '/tmp/prepare' as (url:chararray, mal:float);
urls = FILTER src BY (links IS NOT null);
urls2 = FOREACH urls GENERATE TOKENIZE(links, '\t') as links, id, url;
processed = FOREACH urls2 {
grouped = COGROUP links BY $0, mals BY url;
intersected = FILTER grouped BY NOT IsEmpty(urls) AND NOT IsEmpty(links4);
weights = FOREACH intersected GENERATE mal;
GENERATE id, AVG(weights) as mal;
};
此代码不起作用:分析器失败,原因是:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <file ./Rank.pig, line 11, column 19> [query, statement, foreach_statement, foreach_complex_statement, foreach_clause_complex, foreach_plan_complex, nested_blk, nested_command_list, nested_command, expr, add_expr, multi_expr, cast_expr, unary_expr, expr_eval, var_expr, projectable_expr, func_eval, recoverFromMismatchedToken] mismatched input 'links' expecting LEFT_PAREN
我用的是Pig0.11.0。
据我所知,链接是元组,而mals是bag,所以它们不能同时分组。我如何创建一个包的链接做cogroup?
upd:示例数据集:
/tmp/prepare:
http://1 1.0
http://2 0.9
http://3 0.8
http://4 0.0
HBase:
id: ID
url: http://4
links: http://1 http://2 http://3
作为输出:
{(id: ID, mal: 0.9)}
暂无答案!
目前还没有任何答案,快来回答吧!