环境:hadoop2.7.3、hive-2.2.0-snapshot、tez0.8.4
my core-site.xml:
<property>
<name>fs.s3a.aws.credentials.provider</name>
<value>
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider,
org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider,
com.amazonaws.auth.EnvironmentVariableCredentialsProvider
</value>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<description></description>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>GOODKEYVALUE</value>
<description>AWS access key ID. Omit for Role-based authentication. </description>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>SECRETKEYVALUE</value>
<description>AWS secret key. Omit for Role-based authentication.</description>
</property>
我可以从hadoop命令行正确地访问s3a uri。我可以创建外部表和如下命令:
create external table mytable(a string, b string) location 's3a://mybucket/myfolder/';
select * from mytable limit 20;
执行正确,但是
select count(*) from mytable;
失败原因:
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1489267689011_0001_1_00, diagnostics=[Vertex vertex_1489267689011_0001_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: url_sum_master initializer failed, vertex=vertex_1489267689011_0001_1_00 [Map 1], com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain
at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:131)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1110)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:759)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:723)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4194)
at com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:4949)
at com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:4923)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4178)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4141)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1313)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1270)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:365)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:483)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1489267689011_0001_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1489267689011_0001_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:393)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:250)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:353)
我唯一能让它工作的方法就是accesskey:secretkey in uri本身,这对于生产代码是不可能的。
谢谢。
2条答案
按热度按时间mpbci0fu1#
我通过恢复到hive2.1.1解决了这个问题。
我想问题是jar版本不兼容。我的hadoop-aws-2.7.3.jar是使用aws-java-sdk-1.11.93编译的,而hive则是使用aws1.7.4编译的版本。
lg40wkob2#
你是对的,你不想在uri中有秘密。很快hadoop就会因为你这么做而责骂你,在某个时候它可能会完全阻止你这么做。
看看最新s3a文档中的疑难解答s3a部分。
如果您自己构建hadoop(您的sdk版本选择意味着),那么构建hadoop2.8/2.9并在s3a包中启动调试。有一点更多的安全日志,但仍然有一个故意的需要,日志比你想要的少,以保持这些密钥的秘密。
您还可以尝试在目标计算机上设置aws环境变量。这并不能解决问题,但有助于隔离问题。