HDFS 通过java执行Hadoop distcp,导致NoClassDefFoundError:无法初始化类com.google.cloud.hadoop.fs.gcs. googleHadoop文件系统

u0sqgete  于 2022-12-09  发布在  HDFS
关注(0)|答案(1)|浏览(262)

我尝试使用Hadoop Java库在Hadoop集群上运行distcp命令,将内容从HDFS移动到Google Cloud Bucket。我收到错误NoClassDefFoundError: Could not initialize class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
下面是我的java代码:

import com.google.gson.JsonArray;
import com.google.gson.JsonElement;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.tools.DistCp;
import org.apache.hadoop.tools.DistCpOptions;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class HadoopHelper {

    private static Logger logger = LoggerFactory.getLogger(HadoopHelper.class);

    private static final String FS_DEFAULT_FS = "fs.defaultFS";

    private final Configuration conf;

    public HadoopHelper(String hadoopUrl) {
        conf = new Configuration();
        conf.set(FS_DEFAULT_FS, "hdfs://" + hadoopUrl);
    }

    public void distCP(JsonArray files, String target) {

        try {
            List<Path> srcPaths = new ArrayList<>();

            for (JsonElement file : files) {
                String srcPath = file.getAsString();
                srcPaths.add(new Path(srcPath));
            }

            DistCpOptions options = new DistCpOptions.Builder(
                    srcPaths,
                    new Path("gs://" + target)
            ).build();

            logger.info("Using distcp to copy {} to gs://{}", files, target);

            this.conf.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem");
            this.conf.set("fs.gs.auth.service.account.email", "my-svc-account@my-gcp-project.iam.gserviceaccount.com");
            this.conf.set("fs.gs.auth.service.account.keyfile", "config/my-svc-account-keyfile.p12");
            this.conf.set("fs.gs.project.id", "my-gcp-project");

            DistCp distCp = new DistCp(this.conf, options);
            Job job = distCp.execute();

            job.waitForCompletion(true);

            logger.info("Distcp operation success. Exiting");
        } catch (Exception e) {
            logger.error("Error while trying to execute distcp", e);
            logger.error("Distcp operation failed. Exiting");
            throw new IllegalArgumentException("Distcp failed");
        }
    }

    public void createDirectory() throws IOException {
        FileSystem fileSystem = FileSystem.get(this.conf);
        fileSystem.mkdirs(new Path("/user/newfolder"));
        logger.info("Done");
    }
}

我在pom.xml中添加了以下依赖项:

<dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>3.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>3.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-distcp</artifactId>
        <version>3.3.1</version>
    </dependency>
    <dependency>
        <groupId>com.google.cloud.bigdataoss</groupId>
        <artifactId>gcs-connector</artifactId>
        <version>hadoop3-2.2.4</version>
    </dependency>
    <dependency>
        <groupId>com.google.cloud.bigdataoss</groupId>
        <artifactId>util</artifactId>
        <version>2.2.4</version>
    </dependency>

如果我在集群本身上运行distcp命令,如下所示:hadoop distcp /user gs://my_bucket_name/
distcp操作起作用,内容被复制到Cloud Bucket上。

nsc4cvqm

nsc4cvqm1#

您是否将jar添加到hadoop的类路径中?
将连接器jar添加到Hadoop的类路径中将连接器jar放在HADOOP_COMMON_LIB_JARS_DIR目录中应该足以让Hadoop加载jar。或者,为了确保jar被加载,您可以将HADOOP_CLASSPATH=$HADOOP_CLASSPATH:〈/path/to/gcs-connector.jar〉添加到hadoop-env.shHadoop配置目录中的www.example.com。
这需要在这一行代码之前对DisctCp conf(在您的代码this.conf中)执行以下操作:

this.conf.set("HADOOP_CLASSPATH","$HADOOP_CLASSPATH:/tmp/gcs-connector-latest-hadoop2.jar")
DistCp distCp = new DistCp(this.conf, options);

如果有帮助的话,这里有一个troubleshooting section

相关问题