java—在事件发生时触发spark作业

xfyts7mz 于 2021-06-07 发布在 Kafka

关注(0)|答案(1)|浏览(465)

我有一个spark应用程序，它应该在收到关于某个主题的kafka消息时运行。
我每天不会收到超过5-6条消息，所以我不想采取spark流媒体方式。相反，我尝试使用 SparkLauncher 但我不喜欢这种方法，因为我必须在代码中以编程方式设置spark和java类路径，以及所有必要的spark属性，比如executor核、executor内存等。
如何触发spark应用程序从中运行 spark-submit 但是让它等到收到消息？
任何提示都非常有用。

Java apache-kafka apache-spark kafka-consumer-api runtime.exec

来源：https://stackoverflow.com/questions/39451176/trigger-a-spark-job-whenever-an-event-occurs

1条答案

按热度按时间

ia2d9nvy1#

您可以将shell脚本方法用于 nohup 命令提交这样的作业。。。
" nohup spark-submit shell script <parameters> 2>&1 < /dev/null & "
每当您收到消息时，就可以轮询该事件并调用此shell脚本。
下面是执行此操作的代码段。。。再看一看https://en.wikipedia.org/wiki/nohup

-使用运行时

/**
     * This method is to spark submit
     * <pre> You can call spark-submit or mapreduce job on the fly like this.. by calling shell script... </pre>
     * @param commandToExecute String 
     */
    public static Boolean executeCommand(final String commandToExecute) {
        try {
            final Runtime rt = Runtime.getRuntime();
            // LOG.info("process command -- " + commandToExecute);
            final String[] arr = { "/bin/sh", "-c", commandToExecute};
            final Process proc = rt.exec(arr);
            // LOG.info("process started ");
            final int exitVal = proc.waitFor();
            LOG.trace(" commandToExecute exited with code: " + exitVal);
            proc.destroy();
        } catch (final Exception e) {
            LOG.error("Exception occurred while Launching process : " + e.getMessage());
            return Boolean.FALSE;
        }
             return Boolean.TRUE;
    }

-使用processbuilder-另一种方法

private static void executeProcess(Operation command, String database) throws IOException,
            InterruptedException {
        final File executorDirectory = new File("src/main/resources/");
private final static String shellScript = "./sparksubmit.sh";
ProcessBuilder processBuilder = new ProcessBuilder(shellScript, command.getOperation(), "argument-one");
        processBuilder.directory(executorDirectory);
          Process process = processBuilder.start();
          try {
            int shellExitStatus = process.waitFor();
            if (shellExitStatus != 0) {
                logger.info("Successfully executed the shell script");
            }
        } catch (InterruptedException ex) {
            logger.error("Shell Script process was interrupted");
        }
      }

-第三条路：jsch

使用jsch在ssh上运行命令

-雅恩克利特班-第四路

我最喜欢的一本书数据算法使用这种方法

// import required classes and interfaces
import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
import org.apache.hadoop.conf.Configuration;
import org.apache.spark.SparkConf;
public class SubmitSparkJobToYARNFromJavaCode {
   public static void main(String[] arguments) throws Exception {
       // prepare arguments to be passed to 
       // org.apache.spark.deploy.yarn.Client object
       String[] args = new String[] {
           // the name of your application
           "--name",
           "myname",
           // memory for driver (optional)
           "--driver-memory",
           "1000M",
           // path to your application's JAR file 
           // required in yarn-cluster mode      
           "--jar",
           "/Users/mparsian/zmp/github/data-algorithms-book/dist/data_algorithms_book.jar",
           // name of your application's main class (required)
           "--class",
           "org.dataalgorithms.bonus.friendrecommendation.spark.SparkFriendRecommendation",
           // comma separated list of local jars that want 
           // SparkContext.addJar to work with      
           "--addJars",
           "/Users/mparsian/zmp/github/data-algorithms-book/lib/spark-assembly-1.5.2-hadoop2.6.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/log4j-1.2.17.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/junit-4.12-beta-2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jsch-0.1.42.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/JeraAntTasks.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jedis-2.5.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jblas-1.2.3.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/hamcrest-all-1.3.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/guava-18.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-math3-3.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-math-2.2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-logging-1.1.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-lang3-3.4.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-lang-2.6.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-io-2.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-httpclient-3.0.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-daemon-1.0.5.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-configuration-1.6.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-collections-3.2.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-cli-1.2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/cloud9-1.3.2.jar",
           // argument 1 to your Spark program (SparkFriendRecommendation)
           "--arg",
           "3",
           // argument 2 to your Spark program (SparkFriendRecommendation)
           "--arg",
           "/friends/input",
           // argument 3 to your Spark program (SparkFriendRecommendation)
           "--arg",
           "/friends/output",
           // argument 4 to your Spark program (SparkFriendRecommendation)
           // this is a helper argument to create a proper JavaSparkContext object
           // make sure that you create the following in SparkFriendRecommendation program
           // ctx = new JavaSparkContext("yarn-cluster", "SparkFriendRecommendation");
           "--arg",
           "yarn-cluster"
       };
       // create a Hadoop Configuration object
       Configuration config = new Configuration();
       // identify that you will be using Spark as YARN mode
       System.setProperty("SPARK_YARN_MODE", "true");
       // create an instance of SparkConf object
       SparkConf sparkConf = new SparkConf();
       // create ClientArguments, which will be passed to Client
       ClientArguments cArgs = new ClientArguments(args, sparkConf); 
       // create an instance of yarn Client client
       Client client = new Client(cArgs, config, sparkConf); 
       // submit Spark job to YARN
       client.run(); 
   }
}

展开查看全部

赞(0）回复(0）举报 2021-06-07

我来回答

java—在事件发生时触发spark作业

1条答案

-使用运行时

-使用processbuilder-另一种方法

-第三条路：jsch

-雅恩克利特班-第四路

相关问题

热门标签

最新问答