aws emr配置单元中的动态分区插入错误

g6ll5ycj  于 2021-06-24  发布在  Hive
关注(0)|答案(0)|浏览(244)

我尝试使用动态分区在表(s3位置)中插入数据。

EMR version: emr-5.30.1
Hive version: Hive 2.3.6
execution engine: Tez

查询格式(例如):

INSERT INTO t2 partition (year, month) select c1, c2, year, month from t1;

大约有145个分区,在将数据从s3中的暂存目录移动到目标s3目录时,会出现间歇性作业失败,并出现以下错误
主节点日志:

unable to move <source> (s3) to <destination> (s3)
  org.apache.hadoop.hive.ql.metadata.HiveException: Error moving
  Caused by: java.io.InterruptedIOException: Interrupted copying

步骤日志:

chmod: No such file or directory
  Failed with exception Exception when loading 145 in table t2 with loadPath
  FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
  chgrp: getFileStatus chmod: getFileStatus com.amazonaws.AbortedException (emrfs consistent view was configured)

注:

There are no permission issues as the job is able to move some of the partitions to the destination. And the job succeeds at times, very rarely. hive.exec.dynamic.partition.mode=nonstrict is set.

唯一有效的解决方案:

There are no job failures when I set hive.load.dynamic.partitions.thread=1 (it affects the speed though)

emr hive中的这种动态分区插入有什么解决方案吗?可以用默认值加载动态分区吗 hive.load.dynamic.partitions.thread=15 设置?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题