sqoop作业shell脚本在oozie中并行执行

nwo49xxi  于 2021-06-03  发布在  Sqoop
关注(0)|答案(1)|浏览(461)

我有一个shell脚本 sqoop job . 脚本如下。

!#/bin/bash

table=$1

sqoop job --exec ${table}

现在,当我在工作流中传递表名时,我成功地执行了sqoop作业。
工作流程如下。

<workflow-app name="Shell_script" xmlns="uri:oozie:workflow:0.5">
<start to="shell"/>
<kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell_script">
    <shell xmlns="uri:oozie:shell-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <exec>sqoopjob.sh</exec>
        <argument>test123</argument>
        <file>/user/oozie/sqoop/lib/sqoopjob.sh#sqoopjob.sh</file>
    </shell>
    <ok to="End"/>
    <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

表的作业已成功执行 test123 .
现在我有300个像上面一样的工作。我想并行执行10个sqoop作业。所有表名都在一个文件中。
现在我想循环到文件并对前10个表执行10个sqoop作业,以此类推。
我该怎么做?我应该准备10个工作流程吗?我真的很困惑。

fhg3lkii

fhg3lkii1#

作为@ Samson Scharfrichter 您可以在shell脚本中启动并行作业。制作函数 runJob() 在shell中并行运行。使用此模板:


# !/bin/bash

runJob() {
tableName="$1"

# add other parameters here

# call sqoop here or do something else

# write command logs

# etc, etc

# return 0 on success, return 1 on fail

return 0
}

# Run parallel processes and wait for their completion

# Add loop here or add more calls

runJob $table_name &
runJob $table_name2 &
runJob $table_name3 &

# Note the ampersand in above commands says to create parallel process

# Now wait for all processes to complete

FAILED=0

for job in `jobs -p`
do
   echo "job=$job"
   wait $job || let "FAILED+=1"
done

if [ "$FAILED" != "0" ]; then
    echo "Execution FAILED!  ($FAILED)"
    #Do something here, log or send messege, etc

    exit 1
fi

# All processes are completed successfully!

# Do something here

echo "Done successfully"

相关问题