Hive1.2SQL返回意外的特殊字符

nxowjjhe  于 2021-06-27  发布在  Hive
关注(0)|答案(1)|浏览(423)

运行以下配置单元查询将返回特殊字符:

SELECT t6.amt amt2,t6.color color
FROM(
 SELECT t5.color color, t5.c1 amt
 FROM(
  SELECT t1.c1 c1, t1.c2 AS color 
  from(
   SELECT  7716 AS c1, "Red" AS c2 UNION 
   SELECT  6203 AS c1, "Blue" AS c2
  ) t1
 ) t5
order by color) t6
ORDER BY color

它将结果返回为

amt color
4   �
3   �

是已知的Hive虫吗?
解释计划

Map 5 <- Union 2 (CONTAINS)
Reducer 3 <- Union 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)

Stage-0
   Fetch Operator
      limit:-1
      Stage-1
         Reducer 4
         File Output Operator [FS_331359]
            compressed:false
            Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
            table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
            Select Operator [SEL_331358]
            |  outputColumnNames:["_col0","_col1"]
            |  Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
            |<-Reducer 3 [SIMPLE_EDGE]
               Reduce Output Operator [RS_331357]
                  key expressions:_col1 (type: int)
                  sort order:+
                  Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
                  value expressions:_col0 (type: string)
                  Select Operator [SEL_331351]
                     outputColumnNames:["_col0","_col1"]
                     Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
                     Group By Operator [GBY_331350]
                     |  keys:KEY._col0 (type: int), KEY._col1 (type: string)
                     |  outputColumnNames:["_col0","_col1"]
                     |  Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
                     |<-Union 2 [SIMPLE_EDGE]
                        |<-Map 1 [CONTAINS]
                        |  Reduce Output Operator [RS_331349]
                        |     key expressions:_col0 (type: int), _col1 (type: string)
                        |     Map-reduce partition columns:_col0 (type: int), _col1 (type: string)
                        |     sort order:++
                        |     Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
                        |     Group By Operator [GBY_331348]
                        |        keys:_col0 (type: int), _col1 (type: string)
                        |        outputColumnNames:["_col0","_col1"]
                        |        Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
                        |        Select Operator [SEL_331342]
                        |           outputColumnNames:["_col0","_col1"]
                        |           Statistics:Num rows: 1 Data size: 91 Basic stats: COMPLETE Column stats: COMPLETE
                        |           TableScan [TS_331341]
                        |              alias:_dummy_table
                        |              Statistics:Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
                        |<-Map 5 [CONTAINS]
                           Reduce Output Operator [RS_331349]
                              key expressions:_col0 (type: int), _col1 (type: string)
                              Map-reduce partition columns:_col0 (type: int), _col1 (type: string)
                              sort order:++
                              Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
                              Group By Operator [GBY_331348]
                                 keys:_col0 (type: int), _col1 (type: string)
                                 outputColumnNames:["_col0","_col1"]
                                 Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
                                 Select Operator [SEL_331344]
                                    outputColumnNames:["_col0","_col1"]
                                    Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
                                    TableScan [TS_331343]
                                       alias:_dummy_table
                                       Statistics:Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE

在这里,禁用或启用配置参数可以帮助我吗?
如果我颠倒最外层select中列的顺序,那么查询将返回预期的结果。我本以为结果会是
颜色金额
蓝色6203
红色7716

rfbsl7qr

rfbsl7qr1#

我用mr和tez在我的hive2.3上尝试了同样的查询,结果和你的一样。我关闭了所有的查询优化、统计数据收集和rcp,但结果保持不变。问题是Hive的制造 order by 因为你有两个连续的 order by 的配置单元将它们合并到单个reduce阶段(很容易看到您是否查看扩展或格式化的查询计划)。更准确地说,Hive的用途 _col0, _col1 对于列别名,在 t5 子查询您的密钥是 _col0 但是在 t6 是的 _col1 这就是为什么在select操作符中

expressions:: "_col1 (type: string), _col0 (type: int)"

在reduce输出操作符中

key expressions:: "_col1 (type: int)"

因此,当交换select列时,配置单元将如何切换键类型。如果t5和t6中的类型顺序相同,则没有问题

key expressions:: "_col0 (type: string)"

如何避免这种情况——我真的不知道该怎么做 order by 在单减速机是不是由于额外的优化。

相关问题