avro mapreduce的几个问题

gzszwxb4  于 2021-06-02  发布在  Hadoop
关注(0)|答案(0)|浏览(218)

首先,我通过oozie将mapreduce作为java操作运行。运行mapreduce时出现以下错误:java.lang.classnotfoundexception:class org.apache.avro.mapreduce.avrokeyinputformat

Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.avro.mapreduce.AvroKeyInputFormat not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:184)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.ClassNotFoundException: Class org.apache.avro.mapreduce.AvroKeyInputFormat not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045)
    ... 8 more

首先,我看到我必须通过libjars方法提供必要的jar。在那之后,我很确定所有的jar都可以用于我的驱动程序代码。
这是my workflow.xml:

<action name="run-workflow">
    <java>
      <prepare>
        <delete path='${nameNode}/user/dhruvk/avro_output'/>
      </prepare>
      <main-class>com.dhruvk.AvroDriver</main-class>
      <java-opts>-Dqueue=${queueName} -DinputPath=${nameNode}/user/dhruvk/avro_input -DoutputPath=${nameNode}/user/dhruvk/avro_output</java-opts>
      <arg>-libjars</arg>
      <arg>${nameNode}${workBasePath}/workflow/lib/joda-time-2.3.jar</arg>
      <arg>${nameNode}${workBasePath}/workflow/lib/avro-mapred-1.7.7-hadoop2</arg>
      <arg>${nameNode}${workBasePath}/workflow/lib/avro-1.7.7.jar</arg>
    </java>
    <ok to="end"/>
    <error to="error"/>
  </action>

这就是我的司机的样子:

public class AvroDriver extends Configured implements Tool
{
    public static void main( String[] args ) throws Exception {
      int exitCode = ToolRunner.run(new Configuration(), new AvroDriver(), args);
      System.exit(exitCode);
    }

  @Override
  public int run(String[] args) throws Exception {
    Configuration configuration = getConf();

    Job job = Job.getInstance(configuration, this.getClass().getSimpleName());
    job.setJarByClass(this.getClass());

    String inputDir = getProperty("inputPath");
    String outputDir = getProperty("outputPath");

    job.setJarByClass(AvroDriver.class);
    job.setJobName("Color count");
    FileInputFormat.addInputPath(job, new Path(inputDir));
    FileOutputFormat.setOutputPath(job, new Path(outputDir));

    job.setInputFormatClass(AvroKeyInputFormat.class);
    job.setMapperClass(ColorCountMapper.class);
    AvroJob.setInputKeySchema(job, User.getClassSchema());
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setNumReduceTasks(0);

    return job.waitForCompletion(true) ? 0 : 1;
  }
}

现在我真的不知道为什么这个类不适用于此,作为一个实验,我将打包更改为带有maven assembly plugin的依赖项的jar。
我想既然我有以下依赖性:

<dependency>
      <groupId>org.apache.avro</groupId>
      <artifactId>avro</artifactId>
      <version>1.7.7</version>
    </dependency>
    <dependency>
      <groupId>org.apache.avro</groupId>
      <artifactId>avro-mapred</artifactId>
      <version>1.7.7</version>
      <classifier>hadoop2</classifier>
    </dependency>

    <dependency>
      <groupId>joda-time</groupId>
      <artifactId>joda-time</artifactId>
      <version>2.3</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.5.0-cdh5.3.2</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-core</artifactId>
      <version>2.5.0-cdh5.3.2</version>
    </dependency>

这解决了第一个失踪的问题 AvroKeyInputFormat .
不过,我不认为这是一个好的解决办法,我想了解更多的问题和更好的解决办法。
但是,这样做会带来另一个问题,我调试的时间较少:

Apr 30, 2015 3:48:34 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Apr 30, 2015 3:48:34 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Apr 30, 2015 3:48:34 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Apr 30, 2015 3:48:34 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Apr 30, 2015 3:48:34 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Apr 30, 2015 3:48:34 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Apr 30, 2015 3:48:35 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Apr 30, 2015 3:48:35 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
Error: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to com.dhruvk.models.User
    at com.dhruvk.ColorCountMapper.map(ColorCountMapper.java:15)
    at com.dhruvk.ColorCountMapper.map(ColorCountMapper.java:12)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

这是Map程序的代码:

public class ColorCountMapper extends Mapper<AvroKey<User>, NullWritable, Text, IntWritable> {
  @Override
  public void map(AvroKey<User> user, NullWritable value, Context context) throws IOException, InterruptedException {
    CharSequence color = user.datum().getFavoriteColor();
    if (color == null) {
      color = "none";
    }
    context.write(new Text(color.toString()), new IntWritable(1));
  }
}

我不知道这个例外是什么,也不知道它是从哪里来的。如果有人能帮我解决这两个问题或者指出我遗漏了什么,那就太好了。
编辑:这是用于生成文件的架构。

{
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}

这是生成的文件,我在comment中进行了修改(只添加了包名)。

/**
 * Autogenerated by Avro
 * 
 * DO NOT EDIT DIRECTLY
 */

package com.dhruvk.models; // Package name added by me.

@SuppressWarnings("all")
@org.apache.avro.specific.AvroGenerated
public class User extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
  public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"User\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"favorite_number\",\"type\":[\"int\",\"null\"]},{\"name\":\"favorite_color\",\"type\":[\"string\",\"null\"]}]}");
  public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
  @Deprecated public java.lang.CharSequence name;
  @Deprecated public java.lang.Integer favorite_number;
  @Deprecated public java.lang.CharSequence favorite_color;

  /**
   * Default constructor.  Note that this does not initialize fields
   * to their default values from the schema.  If that is desired then
   * one should use <code>newBuilder()</code>. 
   */
  public User() {}

  /**
   * All-args constructor.
   */
  public User(java.lang.CharSequence name, java.lang.Integer favorite_number, java.lang.CharSequence favorite_color) {
    this.name = name;
    this.favorite_number = favorite_number;
    this.favorite_color = favorite_color;
  }

  public org.apache.avro.Schema getSchema() { return SCHEMA$; }
  // Used by DatumWriter.  Applications should not call. 
  public java.lang.Object get(int field$) {
    switch (field$) {
    case 0: return name;
    case 1: return favorite_number;
    case 2: return favorite_color;
    default: throw new org.apache.avro.AvroRuntimeException("Bad index");
    }
  }
  // Used by DatumReader.  Applications should not call. 
  @SuppressWarnings(value="unchecked")
  public void put(int field$, java.lang.Object value$) {
    switch (field$) {
    case 0: name = (java.lang.CharSequence)value$; break;
    case 1: favorite_number = (java.lang.Integer)value$; break;
    case 2: favorite_color = (java.lang.CharSequence)value$; break;
    default: throw new org.apache.avro.AvroRuntimeException("Bad index");
    }
  }

  /**
   * Gets the value of the 'name' field.
   */
  public java.lang.CharSequence getName() {
    return name;
  }

  /**
   * Sets the value of the 'name' field.
   * @param value the value to set.
   */
  public void setName(java.lang.CharSequence value) {
    this.name = value;
  }

  /**
   * Gets the value of the 'favorite_number' field.
   */
  public java.lang.Integer getFavoriteNumber() {
    return favorite_number;
  }

  /**
   * Sets the value of the 'favorite_number' field.
   * @param value the value to set.
   */
  public void setFavoriteNumber(java.lang.Integer value) {
    this.favorite_number = value;
  }

  /**
   * Gets the value of the 'favorite_color' field.
   */
  public java.lang.CharSequence getFavoriteColor() {
    return favorite_color;
  }

  /**
   * Sets the value of the 'favorite_color' field.
   * @param value the value to set.
   */
  public void setFavoriteColor(java.lang.CharSequence value) {
    this.favorite_color = value;
  }

  /**Creates a new User RecordBuilder */
  public static User.Builder newBuilder() {
    return new User.Builder();
  }

  /**Creates a new User RecordBuilder by copying an existing Builder */
  public static User.Builder newBuilder(User.Builder other) {
    return new User.Builder(other);
  }

  /**Creates a new User RecordBuilder by copying an existing User instance */
  public static User.Builder newBuilder(User other) {
    return new User.Builder(other);
  }

  /**
   * RecordBuilder for User instances.
   */
  public static class Builder extends org.apache.avro.specific.SpecificRecordBuilderBase<User>
    implements org.apache.avro.data.RecordBuilder<User> {

    private java.lang.CharSequence name;
    private java.lang.Integer favorite_number;
    private java.lang.CharSequence favorite_color;

    /**Creates a new Builder */
    private Builder() {
      super(User.SCHEMA$);
    }

    /**Creates a Builder by copying an existing Builder */
    private Builder(User.Builder other) {
      super(other);
      if (isValidValue(fields()[0], other.name)) {
        this.name = data().deepCopy(fields()[0].schema(), other.name);
        fieldSetFlags()[0] = true;
      }
      if (isValidValue(fields()[1], other.favorite_number)) {
        this.favorite_number = data().deepCopy(fields()[1].schema(), other.favorite_number);
        fieldSetFlags()[1] = true;
      }
      if (isValidValue(fields()[2], other.favorite_color)) {
        this.favorite_color = data().deepCopy(fields()[2].schema(), other.favorite_color);
        fieldSetFlags()[2] = true;
      }
    }

    /**Creates a Builder by copying an existing User instance */
    private Builder(User other) {
            super(User.SCHEMA$);
      if (isValidValue(fields()[0], other.name)) {
        this.name = data().deepCopy(fields()[0].schema(), other.name);
        fieldSetFlags()[0] = true;
      }
      if (isValidValue(fields()[1], other.favorite_number)) {
        this.favorite_number = data().deepCopy(fields()[1].schema(), other.favorite_number);
        fieldSetFlags()[1] = true;
      }
      if (isValidValue(fields()[2], other.favorite_color)) {
        this.favorite_color = data().deepCopy(fields()[2].schema(), other.favorite_color);
        fieldSetFlags()[2] = true;
      }
    }

    /**Gets the value of the 'name' field */
    public java.lang.CharSequence getName() {
      return name;
    }

    /**Sets the value of the 'name' field */
    public User.Builder setName(java.lang.CharSequence value) {
      validate(fields()[0], value);
      this.name = value;
      fieldSetFlags()[0] = true;
      return this; 
    }

    /**Checks whether the 'name' field has been set */
    public boolean hasName() {
      return fieldSetFlags()[0];
    }

    /**Clears the value of the 'name' field */
    public User.Builder clearName() {
      name = null;
      fieldSetFlags()[0] = false;
      return this;
    }

    /**Gets the value of the 'favorite_number' field */
    public java.lang.Integer getFavoriteNumber() {
      return favorite_number;
    }

    /**Sets the value of the 'favorite_number' field */
    public User.Builder setFavoriteNumber(java.lang.Integer value) {
      validate(fields()[1], value);
      this.favorite_number = value;
      fieldSetFlags()[1] = true;
      return this; 
    }

    /**Checks whether the 'favorite_number' field has been set */
    public boolean hasFavoriteNumber() {
      return fieldSetFlags()[1];
    }

    /**Clears the value of the 'favorite_number' field */
    public User.Builder clearFavoriteNumber() {
      favorite_number = null;
      fieldSetFlags()[1] = false;
      return this;
    }

    /**Gets the value of the 'favorite_color' field */
    public java.lang.CharSequence getFavoriteColor() {
      return favorite_color;
    }

    /**Sets the value of the 'favorite_color' field */
    public User.Builder setFavoriteColor(java.lang.CharSequence value) {
      validate(fields()[2], value);
      this.favorite_color = value;
      fieldSetFlags()[2] = true;
      return this; 
    }

    /**Checks whether the 'favorite_color' field has been set */
    public boolean hasFavoriteColor() {
      return fieldSetFlags()[2];
    }

    /**Clears the value of the 'favorite_color' field */
    public User.Builder clearFavoriteColor() {
      favorite_color = null;
      fieldSetFlags()[2] = false;
      return this;
    }

    @Override
    public User build() {
      try {
        User record = new User();
        record.name = fieldSetFlags()[0] ? this.name : (java.lang.CharSequence) defaultValue(fields()[0]);
        record.favorite_number = fieldSetFlags()[1] ? this.favorite_number : (java.lang.Integer) defaultValue(fields()[1]);
        record.favorite_color = fieldSetFlags()[2] ? this.favorite_color : (java.lang.CharSequence) defaultValue(fields()[2]);
        return record;
      } catch (Exception e) {
        throw new org.apache.avro.AvroRuntimeException(e);
      }
    }
  }
}

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题