hive会产生错误的结果

mkh04yzy  于 2021-06-04  发布在  Hadoop
关注(0)|答案(0)|浏览(362)

问题是连接一个分区键上有超过2^31行的大分区。
(本文中的输出来自mapr的发行版,但这也是在apachehadoop/hive上复制的)
版本:hadoop-0.20.2 hive-0.10.0
当分区有超过2147483648行(甚至2147483649)时,联接的输出是一行。
当分区的行数少于2147483648行(事件2147483647)时,输出是正确的。
测试用例:
在值为“1”的分区中创建一个包含2147483649行的表,
将此表连接到另一个表,该表具有一行、一列,分区键上的值为“1”。
稍后删除2行并运行相同的连接。
第一:只创建一行
第二:2147483647行

create table max_sint_rows (s1 string) 
partitioned by (p1 string)
ROW FORMAT DELIMITED
   LINES TERMINATED BY  '\n';

Create table small_table (p1 string)
ROW FORMAT DELIMITED
   LINES TERMINATED BY  '\n';

alter table max_sint_rows add partition (p1="1");

将2147483649个随机行写入max\u sint\u行。
将值“1”写入小表格。

create table output_rows_over as 
select a.s1
from  max_sint_rows a join small_table b  
on (a.p1=b.p1);

在reducer的syslog中,我们得到以下输出:

INFO ExecReducer: ExecReducer: processing 2147000000 rows: used memory = 715266312
INFO org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1
INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 5 forwarding 1 rows
INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 6 forwarding 1 rows
INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS maprfs:/user/hadoop/tmp/hive/hive_2013-05-27_20-50-23_849_6140580929822990686/_tmp.-ext-10001/000004_1
INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS maprfs:/user/hadoop/tmp/hive/hive_2013-05-27_20-50-23_849_6140580929822990686/_task_tmp.-ext-10001/_tmp.000004_1
INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS maprfs:/user/hadoop/tmp/hive/hive_2013-05-27_20-50-23_849_6140580929822990686/_tmp.-ext-10001/000004_1
INFO ExecReducer: ExecReducer: processed 2147483650 rows: used memory = 828336712
INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 5 finished. closing...
INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 5 forwarded 1 rows
INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 6 finished. closing...
INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 6 forwarded 1 rows
INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 finished. closing...
INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 forwarded 0 rows
INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1
INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 6 Close done
INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 5 Close done
org.apache.hadoop.mapred.Task: Task:attempt_201305071944_2359_r_000004_1 is done. And is in the process of commiting
INFO org.apache.hadoop.mapred.Task: Task 'attempt_201305071944_2359_r_000004_1' done.
INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-

注意表\u id \u 1_rowcount:1 and 实际上,该表只有一个随机行。
现在从max\u sint\u行中删除2行并重新运行:

create table output_rows_under as 
select a.s1
from  max_sint_rows a join small_table b  
on (a.p1=b.p1);

我们得到2147483647行的输出行,而reducer的syslog是:

INFO ExecReducer: ExecReducer: processed 2147483648 rows: used memory = 243494552
INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 5 finished. closing...
INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 5 forwarded 2147483647 rows
INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 6 finished. closing...
INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 6 forwarded 2147483647 rows
INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 finished. closing...
INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 7 forwarded 0 rows
INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:2147483647
INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 6 Close done
INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 5 Close done
INFO org.apache.hadoop.mapred.Task: Task:attempt_201305071944_2360_r_000004_0 is done. And is in the process of commiting
INFO org.apache.hadoop.mapred.Task: Task 'attempt_201305071944_2360_r_000004_0' done.
INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题