apache pig join的行为与预期不符

bybem2ql  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(393)

我是新来的ApachePig。我创建了两个带制表符分隔字段的文件;employees.txt和employees2.txt[文件中没有行距,这是为了让编辑器满意。]
employees.txt包含:

joe     21      94085   50000.0
Tom     21      94085   50000.0
John    21      94085   50000.0

employees2.txt包含:

joe     4085559898
joe     4085559899
tom     4085559897
tom     4085559896
john    4085559896

然后我尝试一个简单的连接:

e1 = LOAD 'employees.txt' AS (name, age, zip, salary);
e2 = LOAD 'employees2.txt' AS (name, phone);
e3 = JOIN e1 BY name, e2 BY name;
DUMP e3;

结果:

(joe,21,94085,50000.0,joe,4085559899)
(joe,21,94085,50000.0,joe,4085559898)

我期望:

(joe,21,94085,50000.0,joe,4085559899)
(joe,21,94085,50000.0,joe,4085559898)
(Tom,21,94085,50000.0,Tom,4085559897)
(Tom,21,94085,50000.0,Tom,4085559896)
(joe,21,94085,50000.0,Tom,4085559896)

我做错什么了?
谢谢,
克里斯

k7fdbhmy

k7fdbhmy1#

像几乎所有的计算机语言一样,pig是区分大小写的。因此“乔”!=““乔”和“汤姆”!=“汤姆”。
你应该把名单上的名字改一下 employees.txt 文件大小写。那么你应该得到预期的结果。
您可以使用内置的pig string函数lower来完成将name字段转换为全小写的任务。
大致如下:

e1 = LOAD 'employees.txt' AS (name, age, zip, salary);
e2 = LOAD 'employees2.txt' AS (name, phone);
e1_lower = FOREACH e1 GENERATE LOWER(name),age,zip,salary;
e3 = JOIN e1_lower BY name, e2 BY name;
DUMP e3;

相关问题