运行以下配置单元查询将返回特殊字符:
SELECT t6.amt amt2,t6.color color
FROM(
SELECT t5.color color, t5.c1 amt
FROM(
SELECT t1.c1 c1, t1.c2 AS color
from(
SELECT 7716 AS c1, "Red" AS c2 UNION
SELECT 6203 AS c1, "Blue" AS c2
) t1
) t5
order by color) t6
ORDER BY color
它将结果返回为
amt color
4 �
3 �
是已知的Hive虫吗?
解释计划
Map 5 <- Union 2 (CONTAINS)
Reducer 3 <- Union 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 4
File Output Operator [FS_331359]
compressed:false
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
Select Operator [SEL_331358]
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
|<-Reducer 3 [SIMPLE_EDGE]
Reduce Output Operator [RS_331357]
key expressions:_col1 (type: int)
sort order:+
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
value expressions:_col0 (type: string)
Select Operator [SEL_331351]
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator [GBY_331350]
| keys:KEY._col0 (type: int), KEY._col1 (type: string)
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
|<-Union 2 [SIMPLE_EDGE]
|<-Map 1 [CONTAINS]
| Reduce Output Operator [RS_331349]
| key expressions:_col0 (type: int), _col1 (type: string)
| Map-reduce partition columns:_col0 (type: int), _col1 (type: string)
| sort order:++
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
| Group By Operator [GBY_331348]
| keys:_col0 (type: int), _col1 (type: string)
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
| Select Operator [SEL_331342]
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 91 Basic stats: COMPLETE Column stats: COMPLETE
| TableScan [TS_331341]
| alias:_dummy_table
| Statistics:Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
|<-Map 5 [CONTAINS]
Reduce Output Operator [RS_331349]
key expressions:_col0 (type: int), _col1 (type: string)
Map-reduce partition columns:_col0 (type: int), _col1 (type: string)
sort order:++
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator [GBY_331348]
keys:_col0 (type: int), _col1 (type: string)
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator [SEL_331344]
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
TableScan [TS_331343]
alias:_dummy_table
Statistics:Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
在这里,禁用或启用配置参数可以帮助我吗?
如果我颠倒最外层select中列的顺序,那么查询将返回预期的结果。我本以为结果会是
颜色金额
蓝色6203
红色7716
1条答案
按热度按时间rfbsl7qr1#
我用mr和tez在我的hive2.3上尝试了同样的查询,结果和你的一样。我关闭了所有的查询优化、统计数据收集和rcp,但结果保持不变。问题是Hive的制造
order by
因为你有两个连续的order by
的配置单元将它们合并到单个reduce阶段(很容易看到您是否查看扩展或格式化的查询计划)。更准确地说,Hive的用途_col0, _col1
对于列别名,在t5
子查询您的密钥是_col0
但是在t6
是的_col1
这就是为什么在select操作符中在reduce输出操作符中
因此,当交换select列时,配置单元将如何切换键类型。如果t5和t6中的类型顺序相同,则没有问题
如何避免这种情况——我真的不知道该怎么做
order by
在单减速机是不是由于额外的优化。