hiveql连接查询-nvl在where子句中不起作用

5cnsuln7  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(530)

我有一个hiveql查询,如下所示:

create table JOINED as select TABLEA.* from TABLEA join TABLEB on
TABLEA.key=TABLEB.key where nvl(TABLEA.attr, 0)=nvl(TABLEB.attr, 0);

但是这个查询不会选择那些 TABLEA.key=TABLEB.keyTABLEA.attr=NULL 以及 TABLEB.attr=NULL . (或) TABLEA.attr=0 以及 TABLEB.attr=NULL . (或) TABLEA.attr=NULL 以及 TABLEB.attr=0 .
以上病例均未入选。为什么会这样?我是否误解了nvl()的用法?
如果attr属性为null,我希望它默认为0。什么是正确的查询?

8oomwypt

8oomwypt1#

谢谢,我刚刚报告了一个错误-
包含nvl/coalesce的内部join on子句/where的结果不正确
如果你检查一下执行计划,你会发现两个表的 predicate 都错了 attr is not null .
从两个表中选择列(例如。 select TABLEA.*,TABLEB.key )似乎阻止了这个问题。

explain
select TABLEA.* from TABLEA join TABLEB on
TABLEA.key=TABLEB.key where nvl(TABLEA.attr, 0)=nvl(TABLEB.attr, 0);
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        $hdt$_0:tablea 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        $hdt$_0:tablea 
          TableScan
            alias: tablea
            Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (key is not null and attr is not null) (type: boolean)
              Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: key (type: int), attr (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
                HashTable Sink Operator
                  keys:
                    0 _col0 (type: int), NVL(_col1,0) (type: int)
                    1 _col0 (type: int), NVL(_col1,0) (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: tableb
            Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (key is not null and attr is not null) (type: boolean)
              Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: key (type: int), attr (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col0 (type: int), NVL(_col1,0) (type: int)
                    1 _col0 (type: int), NVL(_col1,0) (type: int)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 1 Data size: 15 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 15 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

相关问题