java Apache光束:找不到数据流运行器

mlnl4t2r  于 2023-03-06  发布在  Java
关注(0)|答案(2)|浏览(113)

我尝试在Google Cloud Dataflow上运行一个管道,我可以使用DirectRunner成功运行该管道。当我执行以下Maven命令时:

mvn compile exec:java \
    -Dexec.mainClass=com.example.Pipeline \
    -Dexec.args="--project=project-name \
    --stagingLocation=gs://bucket-name/staging/ \
    ... custom arguments ...
    --runner=DataflowRunner"

出现以下错误:

No Runner was specified and the DirectRunner was not found on the classpath.
[ERROR] Specify a runner by either:
[ERROR]     Explicitly specifying a runner by providing the 'runner' property
[ERROR]     Adding the DirectRunner to the classpath
[ERROR]     Calling 'PipelineOptions.setRunner(PipelineRunner)' directly

我特意从我的pom.xml中删除了DirectRunner,并添加了以下内容:

<dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
    <version>2.0.0</version>
    <scope>runtime</scope>
</dependency>

我继续删除了<scope>标记,然后调用了options.setRunner(DataflowRunner.class),但是没有用,从DataflowPipelineOptions扩展我自己的PipelineOptions接口也没有解决这个问题。
看起来它忽略了runner选项的方式,我无法调试。

  • 更新 *:以下是完整的pom.xml,如果有帮助的话:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example</groupId>
    <artifactId>dataflow</artifactId>
    <version>0.1</version>

    <dependencies>
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-core</artifactId>
            <version>2.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
            <version>2.0.0</version>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-io-jdbc</artifactId>
            <version>2.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
            <version>2.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <version>42.1.4.jre7</version>
        </dependency>
    </dependencies>
</project>
e4yzc0pl

e4yzc0pl1#

忘记将我的PipelineOptions示例作为参数传递给Pipeline.create()方法是我的问题的原因。

PipelineOptionsFactory.register(MyOptions.class);
MyOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(MyOptions.class);
Pipeline pipeline = Pipeline.create(options); // Don't forget the options argument.
...
pipeline.run();
1aaf6o9v

1aaf6o9v2#

您使用的是哪个Dataflow SDK版本?
如果您使用的是Dataflow 2.X,则可以使用DirectRunner
在数据流1.X中,您可以使用DirectPipelineRunner
你也可以看到Getting started instructions here,它建议DataflowRunner和BlockingDataflowRunner(取决于你的版本)。如果可以的话,我建议你先试着让这个工作起来。

相关问题