我对sparkcatalystsql优化器有点困惑(如果有人也能对hive的查询优化器有所了解的话,这会很有用)。下面是一个包含两个子查询q1和q2的查询。如果你密切注意到,除了, predicate 的值 is.true
在这两个子查询中,其他所有内容都是相同的。我的问题是,spark或hive查询优化器是否能够识别这种冗余/相似性,并优化查询以只执行一次洗牌。
select q1.count1, q2.count2 from
(select count(q_id) as count1 from
(select u.tbl, q_id, max(m.is_true) as is_true from
(select tbl, schema, q_id from umap where a_id=1234) u
join
(select distinct schema, table_name, is_true from metadata where id=1234) m
on u.schema = m.schema and u.tbl = m.table_name
group by tbl,q_id) p where p.is_true=1) q1,
(select count(q_id) as count2 from
(select u.tbl, q_id, max(m.is_true) as is_true from
(select tbl, schema, q_id from umap where a_id=1234) u
join
(select distinct schema, table_name, is_true from metadata where id=1234) m
on u.schema = m.schema and u.tbl = m.table_name
group by tbl,q_id) p where p.is_true=0) q2
谢谢
暂无答案!
目前还没有任何答案,快来回答吧!