我在同一列的4个表上执行完全外部联接。我只想为join列中的每个不同值生成一行。
输入为:
employee1
+---------------------+-----------------+--+
| employee1.personid | employee1.name |
+---------------------+-----------------+--+
| 111 | aaa |
| 222 | bbb |
| 333 | ccc |
+---------------------+-----------------+--+
employee2
+---------------------+----------------+--+
| employee2.personid | employee2.sal |
+---------------------+----------------+--+
| 111 | 2 |
| 200 | 3 |
+---------------------+----------------+--+
employee3
+---------------------+------------------+--+
| employee3.personid | employee3.place |
+---------------------+------------------+--+
| 111 | bbsr |
| 300 | atl |
| 200 | ny |
+---------------------+------------------+--+
employee4
+---------------------+---------------+--+
| employee4.personid | employee4.dt |
+---------------------+---------------+--+
| 111 | 2019-02-21 |
| 300 | 2019-03-18 |
| 400 | 2019-03-18 |
+---------------------+---------------+--+
预期结果每个人名有一条记录,所以总共应该有6条记录(111222333200300400),比如:
+-----------+---------+--------+----------+-------------+--+
| personid | f.name | u.sal | v.place | v_in.dt |
+-----------+---------+--------+----------+-------------+--+
| 111 | aaa | 2 | bbsr | 2019-02-21 |
| 200 | NULL | 3 | ny | NULL |
| 222 | bbb | NULL | NULL | NULL |
| 300 | NULL | NULL | atl | 2019-03-18 |
| 333 | ccc | NULL | NULL | NULL |
| 400 | NULL | NULL | NULL | 2019-03-18 |
+-----------+---------+--------+----------+-------------+--+
我得到的结果是:
+-----------+---------+--------+----------+-------------+--+
| personid | f.name | u.sal | v.place | v_in.dt |
+-----------+---------+--------+----------+-------------+--+
| 111 | aaa | 2 | bbsr | 2019-02-21 |
| 200 | NULL | 3 | NULL | NULL |
| 200 | NULL | NULL | ny | NULL |
| 222 | bbb | NULL | NULL | NULL |
| 300 | NULL | NULL | atl | NULL |
| 300 | NULL | NULL | NULL | 2019-03-18 |
| 333 | ccc | NULL | NULL | NULL |
| 400 | NULL | NULL | NULL | 2019-03-18 |
+-----------+---------+--------+----------+-------------+--+
使用的查询:
select coalesce(f.personid, u.personid, v.personid, v_in.personid) as personid,f.name,u.sal,v.place,v_in.dt
from employee1 f FULL OUTER JOIN employee2 u on f.personid=u.personid
FULL OUTER JOIN employee3 v on f.personid=v.personid
FULL OUTER JOIN employee4 v_in on f.personid=v_in.personid;
请建议如何产生预期的结果。
2条答案
按热度按时间shstlldc1#
full outer join
是很棘手的,因为你必须采取前NULL
这是考虑到的。但你可以做到:在支持
using
为了join
s(代替on
)这更简单。我不认为Hive支持using
不过。7y4bm7vi2#
完全联接返回所有联接的行+左表中所有未联接的行+右表中所有未联接的行。既然你要加入
employee2
,employee3
,employee4
相同的employee1
不包含personid=200
,从所有四个表返回所有未联接的行。我建议合并所有四个表,为缺少的字段提供空值+聚合group by
personid
:这将比连接性能更好。