我尝试使用动态分区在表(s3位置)中插入数据。
EMR version: emr-5.30.1
Hive version: Hive 2.3.6
execution engine: Tez
查询格式(例如):
INSERT INTO t2 partition (year, month) select c1, c2, year, month from t1;
大约有145个分区,在将数据从s3中的暂存目录移动到目标s3目录时,会出现间歇性作业失败,并出现以下错误
主节点日志:
unable to move <source> (s3) to <destination> (s3)
org.apache.hadoop.hive.ql.metadata.HiveException: Error moving
Caused by: java.io.InterruptedIOException: Interrupted copying
步骤日志:
chmod: No such file or directory
Failed with exception Exception when loading 145 in table t2 with loadPath
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
chgrp: getFileStatus chmod: getFileStatus com.amazonaws.AbortedException (emrfs consistent view was configured)
注:
There are no permission issues as the job is able to move some of the partitions to the destination. And the job succeeds at times, very rarely. hive.exec.dynamic.partition.mode=nonstrict is set.
唯一有效的解决方案:
There are no job failures when I set hive.load.dynamic.partitions.thread=1 (it affects the speed though)
emr hive中的这种动态分区插入有什么解决方案吗?可以用默认值加载动态分区吗 hive.load.dynamic.partitions.thread=15
设置?
暂无答案!
目前还没有任何答案,快来回答吧!