这是我使用的电子病历步骤,
s3 dist cp--targetsize 1000--outputcodec=gz--s3endpoint=bucket.s3-accelerate.amazonaws.com--groupby./(\d\d)/\d\d/\d\d/'--src s3a://sourcebucket/--dest s3a://destbucket/
加速端点异常。
电子病历版本:
Release label:emr-5.13.0
Hadoop distribution:Amazon 2.8.3
Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.1.0, Presto 0.194
为s3 dist cp传递参数以克服此错误,我遗漏了什么?
Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: To enable accelerate mode, please use AmazonS3ClientBuilder.withAccelerateModeEnabled(true)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)
at com.amazon.ws.emr.hadoop.fs.s3.lite.provider.DefaultS3Provider.getS3(DefaultS3Provider.java:55)
at com.amazon.ws.emr.hadoop.fs.s3.lite.provider.DefaultS3Provider.getS3(DefaultS3Provider.java:22)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.getClient(GlobalS3Executor.java:122)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:89)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:176)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.doesBucketExist(AmazonS3LiteClient.java:88)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.ensureBucketExists(Jets3tNativeFileSystemStore.java:138)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:116)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.initialize(S3NativeFileSystem.java:448)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.java:109)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2859)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2896)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2878)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:392)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:869)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:705)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.IllegalStateException: To enable accelerate mode, please use AmazonS3ClientBuilder.withAccelerateModeEnabled(true)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.setEndpoint(AmazonS3Client.java:670)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.AmazonWebServiceClient.withEndpoint(AmazonWebServiceClient.java:897)
at com.amazon.ws.emr.hadoop.fs.s3.lite.provider.DefaultS3Provider$S3CacheLoader.load(DefaultS3Provider.java:62)
at com.amazon.ws.emr.hadoop.fs.s3.lite.provider.DefaultS3Provider$S3CacheLoader.load(DefaultS3Provider.java:58)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
... 30 more
Command exiting with ret '1'
1条答案
按热度按时间c9x0cxw01#
s3distcp是基于hadoopaws库构建的,hadoopaws库不支持使用现成的加速bucket。
您希望创建自己的jar,其中包含对hadoopaws和amazon-sdk-s3的依赖,在那里转换所需的参数,并扩展s3clientfactory以实现加速上载。
maven依赖关系示例:
s3客户工厂:
最后一步是为hadoop提供s3工厂类:
这也可以从命令行完成,所以您可以直接在emr接口或emr sdk中指定它。
对于copy本身,可以使用hadoop fileutil.copy api,您可以在那里指定源和目标,以及所需的配置。
对于某些特定的文件格式,或者不是基于fs的源或目标,spark可以考虑在上述实用程序上使用。在某些情况下,它可以加快转移速度。
因此,现在您可以用jar向emr发送一个步骤:
在args中输入所有必需的参数,如源和目标s3路径。
注意:不要指定hadoop aws支持的特定于bucket的端点。它以与加速不兼容的方式使用它,每次都会得到相同的异常。
链接:
aws s3加速示例
自定义s3客户端工厂
带有自定义jar示例的emr步骤