配置单元：gc开销或堆空间错误-动态分区表

lsmepo6l 于 2021-06-28 发布在 Hive

关注(0)|答案(1)|浏览(473)

您能指导我解决这个gc开销和堆空间错误吗。
我正在尝试使用以下查询从另一个表（动态分区）插入分区表：

INSERT OVERWRITE table tbl_part PARTITION(county)
SELECT  col1, col2.... col47, county FROM tbl;

我已运行以下参数：

export  HADOOP_CLIENT_OPTS=" -Xmx2048m"
set hive.exec.dynamic.partition=true;  
set hive.exec.dynamic.partition.mode=nonstrict; 
SET hive.exec.max.dynamic.partitions=2048;
SET hive.exec.max.dynamic.partitions.pernode=256;
set mapreduce.map.memory.mb=2048;
set yarn.scheduler.minimum-allocation-mb=2048;
set hive.exec.max.created.files=250000;
set hive.vectorized.execution.enabled=true;
set hive.merge.smallfiles.avgsize=283115520;
set hive.merge.size.per.task=209715200;

也添加在yarn-site.xml中：

<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for    containers</description>
</property>

<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>

自由奔跑-m：

total       used       free     shared    buffers     cached
Mem:         15347      11090       4256          0        174       6051
-/+ buffers/cache:       4864      10483
Swap:        15670         18      15652

它是一个具有1个核心的独立集群。准备测试数据以在spark中运行我的单元测试用例。
你能告诉我还能做什么吗。
源表具有以下属性：

Table Parameters:       
    COLUMN_STATS_ACCURATE   true                
    numFiles                13                  
    numRows                 10509065            
    rawDataSize             3718599422          
    totalSize               3729108487          
    transient_lastDdlTime   1470909228

谢谢您。

Hive reduce hadoop-partitioning out-of-memory memory-efficient

来源：https://stackoverflow.com/questions/38939993/hive-gc-overhead-or-heap-space-error-dynamic-partitioned-table

1条答案

按热度按时间

yws3nbqq1#

添加 DISTRIBUTE BY county 对您的查询：

INSERT OVERWRITE table tbl_part PARTITION(county) SELECT  col1, col2.... col47, county FROM tbl DISTRIBUTE BY county;

赞(0）回复(0）举报 2021-06-28

我来回答

配置单元：gc开销或堆空间错误-动态分区表

1条答案

相关问题

热门标签

最新问答