需要多少个配置单元动态分区?

0s0u357o  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(473)

我正在运行一个大的工作,整合约55流(标签)的样本(一个样本每记录)在不定期的时间超过两年到15分钟的平均值。原始数据集中的23k个流中大约有11亿条记录,这55个流构成了其中的3300万条记录。我计算了一个15分钟的索引,并将其分组以获得平均值,但是我似乎已经超过了我的配置单元作业上的最大动态分区数,尽管将其调到20k。我想我可以进一步增加它,但是失败已经需要一段时间了(大约6个小时,虽然我通过减少要考虑的流的数量将它减少到2个),而且我实际上不知道如何计算我真正需要多少。
代码如下:

SET hive.exec.dynamic.partition = true; 
SET hive.exec.dynamic.partition.mode = nonstrict; 
SET hive.exec.max.dynamic.partitions=50000;
SET hive.exec.max.dynamic.partitions.pernode=20000; 

DROP TABLE IF EXISTS sensor_part_qhr; 

 CREATE TABLE sensor_part_qhr (
    tag  STRING,
    tag0 STRING,
    tag1 STRING,
    tagn_1  STRING,
    tagn  STRING,

    timestamp  STRING,
    unixtime INT,
    qqFr2013 INT,

    quality  INT,
    count  INT,
    stdev  double,
    value    double
)  
PARTITIONED BY (bld STRING);

INSERT INTO TABLE sensor_part_qhr
PARTITION (bld) 
SELECT  tag,
        min(tag), 
        min(tag0), 
        min(tag1), 
        min(tagn_1), 
        min(tagn),

        min(timestamp),
        min(unixtime),  
        qqFr2013,

        min(quality),
    count(value),
    stddev_samp(value), 
        avg(value)
FROM    sensor_part_subset     
WHERE   tag1='Energy'
GROUP BY tag,qqFr2013;

下面是错误信息:

Error during job, obtaining debugging information...
    Examining task ID: task_1442824943639_0044_m_000008 (and more) from job job_1442824943639_0044
    Examining task ID: task_1442824943639_0044_r_000000 (and more) from job job_1442824943639_0044

    Task with the most failures(4): 
    -----
    Task ID:
      task_1442824943639_0044_r_000000

    URL:
      http://headnodehost:9014/taskdetails.jsp?jobid=job_1442824943639_0044&tipid=task_1442824943639_0044_r_000000
    -----
    Diagnostic Messages for this Task:
    Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to: 20000
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException:

    [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. 
    The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. 
    Maximum was set to: 20000

        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:747)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829)
        at org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:498)
        at org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:521)
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:232)
        ... 7 more

    Container killed by the ApplicationMaster.
    Container killed on request. Exit code is 137
    Container exited with a non-zero exit code 137

    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
    MapReduce Jobs Launched: 
    Job 0: Map: 520  Reduce: 140   Cumulative CPU: 7409.394 sec   HDFS Read: 0 HDFS Write: 393345977 SUCCESS
    Job 1: Map: 9  Reduce: 1   Cumulative CPU: 87.201 sec   HDFS Read: 393359417 HDFS Write: 0 FAIL
    Total MapReduce CPU Time Spent: 0 days 2 hours 4 minutes 56 seconds 595 msec

有没有人能给我一些想法,如何计算我可能需要多少这样的工作动态节点?
或者我应该换个方式做?我正在azure hdinsight上运行hive 0.13。
更新:
更正了上面的一些数字。
将它减少到3个流,在211k记录上运行,最后成功了。
开始试验,将每个节点的分区数减少到5k,然后是1k,但仍然成功。
所以我不再被阻塞,但我想我需要数百万个节点来一次性完成整个数据集(这是我真正想做的)。

js5cn81o

js5cn81o1#

在sensor\u part\u qhr中插入期间,必须在select语句的列中最后指定动态分区列。

相关问题