配置单元:gc开销或堆空间错误-动态分区表

lsmepo6l  于 2021-06-28  发布在  Hive
关注(0)|答案(1)|浏览(539)

您能指导我解决这个gc开销和堆空间错误吗。
我正在尝试使用以下查询从另一个表(动态分区)插入分区表:

  1. INSERT OVERWRITE table tbl_part PARTITION(county)
  2. SELECT col1, col2.... col47, county FROM tbl;

我已运行以下参数:

  1. export HADOOP_CLIENT_OPTS=" -Xmx2048m"
  2. set hive.exec.dynamic.partition=true;
  3. set hive.exec.dynamic.partition.mode=nonstrict;
  4. SET hive.exec.max.dynamic.partitions=2048;
  5. SET hive.exec.max.dynamic.partitions.pernode=256;
  6. set mapreduce.map.memory.mb=2048;
  7. set yarn.scheduler.minimum-allocation-mb=2048;
  8. set hive.exec.max.created.files=250000;
  9. set hive.vectorized.execution.enabled=true;
  10. set hive.merge.smallfiles.avgsize=283115520;
  11. set hive.merge.size.per.task=209715200;

也添加在yarn-site.xml中:

  1. <property>
  2. <name>yarn.nodemanager.vmem-check-enabled</name>
  3. <value>false</value>
  4. <description>Whether virtual memory limits will be enforced for containers</description>
  5. </property>
  6. <property>
  7. <name>yarn.nodemanager.vmem-pmem-ratio</name>
  8. <value>4</value>
  9. <description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
  10. </property>

自由奔跑-m:

  1. total used free shared buffers cached
  2. Mem: 15347 11090 4256 0 174 6051
  3. -/+ buffers/cache: 4864 10483
  4. Swap: 15670 18 15652

它是一个具有1个核心的独立集群。准备测试数据以在spark中运行我的单元测试用例。
你能告诉我还能做什么吗。
源表具有以下属性:

  1. Table Parameters:
  2. COLUMN_STATS_ACCURATE true
  3. numFiles 13
  4. numRows 10509065
  5. rawDataSize 3718599422
  6. totalSize 3729108487
  7. transient_lastDdlTime 1470909228

谢谢您。

yws3nbqq

yws3nbqq1#

添加 DISTRIBUTE BY county 对您的查询:

  1. INSERT OVERWRITE table tbl_part PARTITION(county) SELECT col1, col2.... col47, county FROM tbl DISTRIBUTE BY county;

相关问题