如何从java中的云函数触发云数据流管道作业?

sqserrrh  于 2021-07-08  发布在  Java
关注(0)|答案(1)|浏览(455)

我需要从云函数触发云数据流管道。但是云函数必须用java编写。因此,触发云功能的是google云存储的finalize/create事件,即当文件上传到gcs bucket中时,云功能必须触发云数据流。
当我创建一个数据流管道(批处理)并执行该管道时,它会创建一个数据流管道模板并创建一个数据流作业。
但是当我用java创建一个cloud函数,并上传一个文件时,状态只是说“ok”,但它不会触发数据流管道。
云函数

  1. package com.example;
  2. import com.example.Example.GCSEvent;
  3. import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport;
  4. import com.google.api.client.http.HttpRequestInitializer;
  5. import com.google.api.client.http.HttpTransport;
  6. import com.google.api.client.json.JsonFactory;
  7. import com.google.api.client.json.jackson2.JacksonFactory;
  8. import com.google.api.services.dataflow.Dataflow;
  9. import com.google.api.services.dataflow.model.CreateJobFromTemplateRequest;
  10. import com.google.api.services.dataflow.model.RuntimeEnvironment;
  11. import com.google.auth.http.HttpCredentialsAdapter;
  12. import com.google.auth.oauth2.GoogleCredentials;
  13. import com.google.cloud.functions.BackgroundFunction;
  14. import com.google.cloud.functions.Context;
  15. import java.io.IOException;
  16. import java.security.GeneralSecurityException;
  17. import java.util.HashMap;
  18. import java.util.logging.Logger;
  19. public class Example implements BackgroundFunction<GCSEvent> {
  20. private static final Logger logger = Logger.getLogger(Example.class.getName());
  21. @Override
  22. public void accept(GCSEvent event, Context context) throws IOException, GeneralSecurityException {
  23. logger.info("Event: " + context.eventId());
  24. logger.info("Event Type: " + context.eventType());
  25. HttpTransport httpTransport = GoogleNetHttpTransport.newTrustedTransport();
  26. JsonFactory jsonFactory = JacksonFactory.getDefaultInstance();
  27. GoogleCredentials credentials = GoogleCredentials.getApplicationDefault();
  28. HttpRequestInitializer requestInitializer = new HttpCredentialsAdapter(credentials);
  29. Dataflow dataflowService = new Dataflow.Builder(httpTransport, jsonFactory, requestInitializer)
  30. .setApplicationName("Google Dataflow function Demo")
  31. .build();
  32. String projectId = "my-project-id";
  33. RuntimeEnvironment runtimeEnvironment = new RuntimeEnvironment();
  34. runtimeEnvironment.setBypassTempDirValidation(false);
  35. runtimeEnvironment.setTempLocation("gs://my-dataflow-job-bucket/tmp");
  36. CreateJobFromTemplateRequest createJobFromTemplateRequest = new CreateJobFromTemplateRequest();
  37. createJobFromTemplateRequest.setEnvironment(runtimeEnvironment);
  38. createJobFromTemplateRequest.setLocation("us-central1");
  39. createJobFromTemplateRequest.setGcsPath("gs://my-dataflow-job-bucket-staging/templates/cloud-dataflow-template");
  40. createJobFromTemplateRequest.setJobName("Dataflow-Cloud-Job");
  41. createJobFromTemplateRequest.setParameters(new HashMap<String,String>());
  42. createJobFromTemplateRequest.getParameters().put("inputFile","gs://cloud-dataflow-bucket-input/*.txt");
  43. dataflowService.projects().templates().create(projectId,createJobFromTemplateRequest);
  44. throw new UnsupportedOperationException("Not supported yet.");
  45. }
  46. public static class GCSEvent {
  47. String bucket;
  48. String name;
  49. String metageneration;
  50. }
  51. }

xml(云函数)

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <project xmlns="http://maven.apache.org/POM/4.0.0"
  3. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  4. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  5. <modelVersion>4.0.0</modelVersion>
  6. <groupId>cloudfunctions</groupId>
  7. <artifactId>http-function</artifactId>
  8. <version>1.0-SNAPSHOT</version>
  9. <properties>
  10. <maven.compiler.target>11</maven.compiler.target>
  11. <maven.compiler.source>11</maven.compiler.source>
  12. </properties>
  13. <dependencies>
  14. <!-- https://mvnrepository.com/artifact/com.google.auth/google-auth-library-credentials -->
  15. <dependency>
  16. <groupId>com.google.auth</groupId>
  17. <artifactId>google-auth-library-credentials</artifactId>
  18. <version>0.21.1</version>
  19. </dependency>
  20. <dependency>
  21. <groupId>com.google.apis</groupId>
  22. <artifactId>google-api-services-dataflow</artifactId>
  23. <version>v1b3-rev207-1.20.0</version>
  24. </dependency>
  25. <dependency>
  26. <groupId>com.google.cloud.functions</groupId>
  27. <artifactId>functions-framework-api</artifactId>
  28. <version>1.0.1</version>
  29. </dependency>
  30. <dependency>
  31. <groupId>com.google.auth</groupId>
  32. <artifactId>google-auth-library-oauth2-http</artifactId>
  33. <version>0.21.1</version>
  34. </dependency>
  35. </dependencies>
  36. <!-- Required for Java 11 functions in the inline editor -->
  37. <build>
  38. <plugins>
  39. <plugin>
  40. <groupId>org.apache.maven.plugins</groupId>
  41. <artifactId>maven-compiler-plugin</artifactId>
  42. <version>3.8.1</version>
  43. <configuration>
  44. <excludes>
  45. <exclude>.google/</exclude>
  46. </excludes>
  47. </configuration>
  48. </plugin>
  49. </plugins>
  50. </build>
  51. </project>

云功能日志

我浏览了下面的博客(添加以供参考),它们通过云功能触发了来自云存储的数据流。但是代码是用node.js或python编写的。但是我的云函数必须用java编写。
通过node.js中的云函数触发数据流管道
https://dzone.com/articles/triggering-dataflow-pipelines-with-cloud-functions
使用python通过云函数触发数据流管道
https://medium.com/google-cloud/how-to-kick-off-a-dataflow-pipeline-via-cloud-functions-696927975d4e
非常感谢您的帮助。

mcvgt66p

mcvgt66p1#

  1. RuntimeEnvironment runtimeEnvironment = new RuntimeEnvironment();
  2. runtimeEnvironment.setBypassTempDirValidation(false);
  3. runtimeEnvironment.setTempLocation("gs://karthiksfirstbucket/temp1");
  4. LaunchTemplateParameters launchTemplateParameters = new LaunchTemplateParameters();
  5. launchTemplateParameters.setEnvironment(runtimeEnvironment);
  6. launchTemplateParameters.setJobName("newJob" + (new Date()).getTime());
  7. Map<String, String> params = new HashMap<String, String>();
  8. params.put("inputFile", "gs://karthiksfirstbucket/sample.txt");
  9. params.put("output", "gs://karthiksfirstbucket/count1");
  10. launchTemplateParameters.setParameters(params);
  11. writer.write("4");
  12. Dataflow.Projects.Templates.Launch launch = dataflowService.projects().templates().launch(projectId, launchTemplateParameters);
  13. launch.setGcsPath("gs://dataflow-templates-us-central1/latest/Word_Count");
  14. launch.execute();

上面的代码启动一个模板并执行数据流管道
使用应用程序默认凭据(可以更改为用户凭据或服务凭据)
区域是默认区域(可以更改)。
为每个http触发器创建一个作业(触发器可以更改)。
完整代码如下:
https://github.com/karthikeyan1127/java_cloudfunction_dataflow/blob/master/hello.java

展开查看全部

相关问题