混淆hadoop job tracker api

inb24sb2  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(388)

我想从求职者那里收集一些信息。对于初学者,我想从获取正在运行的作业信息开始,如作业id或作业名称等,但已经卡住了,下面是我得到的(打印当前正在运行的作业的作业id):

public static void main(String[] args) throws IOException {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "zk1.myhost,zk2.myhost,zk3.myhost");
        conf.set("hbase.zookeeper.property.clientPort", "2181");

        InetSocketAddress jobtracker = new InetSocketAddress("jobtracker.mapredhost.myhost", 8021);
        JobClient jobClient = new JobClient(jobtracker, conf);
        JobStatus[] jobs = jobClient.jobsToComplete();

        for (int i = 0; i < jobs.length; i++) {
            JobStatus js = jobs[i];
            if (js.getRunState() == JobStatus.RUNNING) {
                JobID jobId = js.getJobID();
                System.out.println(jobId);
            }
        }
    }

当试图显示job id时,上面的这个方法很有用,但是现在我也想显示job名称。所以我在打印作业id后添加了这行:

System.out.println(jobClient.getJob(jobId).getJobName());

我有个例外:

Exception in thread "main" java.lang.NullPointerException
    at org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:226)
    at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1080)
    at org.apache.test.JobTracker.main(JobTracker.java:28)
``` `jobClient` 不是 `null` . 我知道这是因为我尝试了null check if语句,但是 `jobClient.getJob(jobId)` 是 `null` . 我做错什么了?
根据api我应该没事,
http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapred/jobclient.html#getjob(org.apache.hadoop.mapred.jobid)
第一个得到 `RunningJob` 从jobclient获取一次以上的作业,然后获取它的名称http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapred/runningjob.html#getjobname()
以前有人做过这种事吗?我可以使用jsoup通过get请求获取此信息,但我认为这是获取此信息的更好方法。
问题更新这里是我的hadoop/hbase依赖项:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobID;
import org.apache.hadoop.mapred.JobStatus;

下面是 `System.out.println(jobId)` :

job_201207031810_1603

当前只有一个作业正在运行。
b0zn9rqh

b0zn9rqh1#

看一下内部类 NetworkedJobJobClient .
(来源:/home/user/hadoop/src/mapred/org/apache/hadoop/mapred/jobclient.java)
它的构造函数试图获取 Configuration 对象来自 JobClient 在225行,但它是空的,因为 new JobClient(InetSocketAddress jobTrackAddr, Configuration conf) 没有设置:

// Set the completion poll interval from the configuration.
      // Default is 5 seconds.
      Configuration conf = JobClient.this.getConf();
      this.completionPollIntervalMillis = conf.getInt(COMPLETION_POLL_INTERVAL_KEY,
          DEFAULT_COMPLETION_POLL_INTERVAL); //NPE occurs here!

作为一种解决方法,在创建jobclient对象之后手动设置它。这将解决您的问题:

..
JobClient jobClient = new JobClient(jobtracker, conf);
jobClient.setConf(conf); 
....

旁注:
我示例化了 Configuration 对象通过:

Configuration conf = new Configuration();
conf.addResource(new Path("/path_to/core-site.xml"));
conf.addResource(new Path("/path_to/hdfs-site.xml"));

相关问题