如何在amazon for emr的cli中用“-files”指定多个文件?

mec1mxoz  于 2021-05-30  发布在  Hadoop
关注(0)|答案(3)|浏览(397)

我正试图通过amazoncli启动amazon集群,但我有点搞不清楚应该如何指定多个文件。我现在的电话如下:

aws emr create-cluster --steps Type=STREAMING,Name='Intra country development',ActionOnFailure=CONTINUE,Args=[-files,s3://betaestimationtest/mapper.py,-
files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-
input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra] 
--ami-version 3.1.0 
--instance-groupsInstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge 
InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate 
--log-uri s3://betaestimationtest/logs

但是,hadoop现在抱怨找不到reducer文件:

Caused by: java.io.IOException: Cannot run program "reducer.py": error=2, No such file or directory

我做错什么了?文件确实存在于我指定的文件夹中

zbdgwd5y

zbdgwd5y1#

要在流式处理步骤中传递多个文件,需要使用file://将这些步骤作为json文件传递。
awscli速记语法使用逗号作为分隔符来分隔参数列表。因此,当我们尝试传入诸如:“-files”、“s3://betaestimationtest/mapper.py、s3://betaestimationtest/reducer.py”之类的参数时,速记语法解析器会将mapper.py和reducer.py文件视为两个参数。
解决方法是使用json格式。请看下面的例子。

aws emr create-cluster --steps file://./mysteps.json --ami-version 3.1.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate --log-uri s3://betaestimationtest/logs

mysteps.json看起来像:

[
    {
    "Name": "Intra country development",
    "Type": "STREAMING",
    "ActionOnFailure": "CONTINUE",
    "Args": [
        "-files",
        "s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py",
        "-mapper",
        "mapper.py",
        "-reducer",
        "reducer.py",
        "-input",
        " s3://betaestimationtest/output_0_inte",
        "-output",
        " s3://betaestimationtest/output_1_intra"
    ]}
]

您还可以在这里找到示例:https://github.com/aws/aws-cli/blob/develop/awscli/examples/emr/create-cluster-examples.rst. 参见示例13。
希望有帮助!

ie3xauqp

ie3xauqp2#

如果要指定-files两次,则只需指定一次。我忘记了cli是否需要分隔符为空格或逗号来表示多个值,但您可以尝试一下。
您应该替换:

Args=[-files,s3://betaestimationtest/mapper.py,-files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

使用:

Args=[-files,s3://betaestimationtest/mapper.py s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

或者如果失败了,那么:

Args=[-files,s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]
rmbxnbpk

rmbxnbpk3#

为逗号分隔的文件添加转义符:

Args=[-files,s3://betaestimationtest/mapper.py\\,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

相关问题