如何将详细的状态信息从进程传递回aws emr map reduce步骤

yqlxgs2m 于 2021-07-13 发布在 Hadoop

关注(0)|答案(0)|浏览(239)

我使用aws emr创建集群，然后创建步骤，运行一个java应用程序，该应用程序使用hadoop mapreduce执行步骤，然后启动一个进程的多个迭代，有时在一个或所有迭代中都会失败。今天，当这种情况发生时，迭代如预期的那样失败，并且该步骤的状态设置为“失败”。然后，我可以通过编程方式检测该状态，并更新数据库以供其他服务使用，从而知道作业失败。但我想要的是关于这一步失败原因的更详细的信息。
在启动mapreduce的步骤中运行的应用程序（emr mainclass）使用toolrunner（）运行作业，如下所示

Job job = Job.getInstance(conf, params.jobID + "_" + params.iteration);

    job.setJarByClass(MyMapReduce.class);
    job.setMapperClass(MyProcessMapper.class);

    if (!(job.waitForCompletion(true))) {
        return -1;
    }
    return 0;

myprocessmapper类有常用的setup（）和run（）方法。

@Override
protected void setup(Context context) throws IOException, InterruptedException {
    attemptId = context.getTaskAttemptID();
}

@Override
public void map(String key, String value, Context context) throws IOException, InterruptedException {

    // do stuff

    // detect problem but not sure how to report it back to calling entity
}

我知道如何在myprocessmapper的范围内检测失败原因。如何将该字符串提供回emr，以便将其与该步骤的失败状态相关联？我想我应该能够使用如下使用withfailuredetails（）的代码来更新我的数据库

AmazonElasticMapReduceClient emr = new AmazonElasticMapReduceClient(credentials);

    DescribeStepResult res = emr
            .describeStep(new DescribeStepRequest().withClusterId(clusterID).withStepId(stepID));

    StepStatus stepStatus = res.getStep().getStatus().withFailureDetails();

我想我可以在main类代码中使用setfailuredetails（）进行设置，但是由于使用了map（），我看不到如何将失败原因字符串从myprocessmapper传递回main类。我的备选方案没有一个看起来很好，因为它们不是很优雅或安全，但我考虑的是1）向myprocessmapper提供aws凭据，这样它就可以自己更新失败状态，或者2）将这些信息从myprocessmapper写到另一个数据库中，避免整个hadoop问题。这通常是怎么做的？

hadoop mapreduce amazon-emr amazon-web-services

来源：https://stackoverflow.com/questions/66342139/how-to-pass-detailed-status-information-back-from-process-to-the-aws-emr-map-red

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

如何将详细的状态信息从进程传递回aws emr map reduce步骤

暂无答案！

相关问题

热门标签

最新问答