使用配置单元将数据写入ADL时遇到问题

7jmck4yq  于 2021-06-24  发布在  Hive
关注(0)|答案(0)|浏览(344)

我通过给adls gen1位置“adl://[path to adls location]”创建了一个托管表,例如hive中的表a。
表a是一个分区表,记录作为parquets文件存储在adls gen 1位置。
我正在尝试从另一个配置单元表b向该表插入数据。表b表具有高达32 gb的大量数据。
我使用下面的配置来允许数据插入到表a中,表a是一个分区表,使用:

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode=nonstrict;

我尝试的查询具有如下示例语法:

INSERT INTO TABLE A Partition(id)
select 
a,
b,
c
from b where id >=10
distribute by id;

上面的查询在插入数据时出现以下错误:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: com.microsoft.azure.datalake.store.ADLException: Error appending to file /***/_task_tmp.-ext-10000/***/_tmp.000010_3
Operation APPEND failed with exception java.io.IOException : Error writing to server
Last encountered exception thrown after 5 tries. [java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException]
 [ServerRequestId:null]
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:751)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
        ... 7 more
Caused by: com.microsoft.azure.datalake.store.ADLException: Error appending to file /***/_task_tmp.-ext-10000/***/_tmp.000010_3
Operation APPEND failed with exception java.io.IOException : Error writing to server
Last encountered exception thrown after 5 tries. [java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException]
 [ServerRequestId:null]
        at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1176)
        at com.microsoft.azure.datalake.store.ADLFileOutputStream.flush(ADLFileOutputStream.java:180)
        at com.microsoft.azure.datalake.store.ADLFileOutputStream.write(ADLFileOutputStream.java:119)
        at org.apache.hadoop.fs.adl.AdlFsOutputStream.write(AdlFsOutputStream.java:63)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at parquet.bytes.BytesInput$ByteArrayBytesInput.writeAllTo(BytesInput.java:355)
        at parquet.hadoop.ParquetFileWriter.writeDictionaryPage(ParquetFileWriter.java:320)
        at parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:179)
        at parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:238)
        at parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:160)
        at parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:136)
        at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:118)
        at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
        at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
        at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:136)
        at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:149)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:717)
        ... 10 more

我需要解决这个问题,因为我需要存储在adls gen1位置的数据。上面的查询对于多达100万条记录的数据工作正常,但当数据大小增加时会失败。
我厌倦了增加减速机内存,进程加快了,但最终失败了。
我还注意到,如果我们将表a的位置改为hdfs而不是adls,那么查询对于~32gb数据和更多数据都可以正常工作。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题