通过sparklyr连接到s3 bucket时出现签名错误

dzhpxtsq  于 2021-06-01  发布在  Hadoop
关注(0)|答案(3)|浏览(418)

当我尝试使用sparklyr从r studio连接到s3 bucket时,遇到了一个错误。s3桶位于eu-central-1(法兰克福)地区。spark版本-2.1.0,Hadoop2.7。我得到了一个403响应代码与签名不匹配的错误。然而,当我试图得到一个s3a桶,而不是我得到一个400响应代码。任何关于替代方法的话,连接到s3桶通过Spark在r工作室,也很感激。s3的连接工作正常,没有Spark。
这是密码,


# install.packages("devtools")

# devtools::install_github("rstudio/sparklyr")

library(sparklyr)
library(dplyr)
sc <- spark_connect(master = "local")
spark_disconnect(sc)
config <- spark_config()
library(sparklyr)
library(dplyr)

# config$sparklyr.defaultPackages <- "org.apache.hadoop:hadoop-aws:2.7.3"

# config$spark.executor.memory <- "4g"

sc <- spark_connect(master = "local",config = config)

ctx <- sparklyr::spark_context(sc)
jsc <- invoke_static(
sc,
"org.apache.spark.api.java.JavaSparkContext",
"fromSparkContext",
ctx
)

hconf <- jsc %>% invoke("hadoopConfiguration")
hconf %>% invoke("set","fs.s3.access.key", "xx")
hconf %>% invoke("set","fs.s3.secret.key", "xx")

# hconf %>% invoke("set","com.amazonaws.services.s3.enableV4", "true")

test <- spark_read_csv(sc, "test", "s3://********.csv")
Error: org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Service Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message><AWSAccessKeyId>AKIAJ3ZD2ZEISKNQMSGQ</AWSAccessKeyId><StringToSign>AWS4-HMAC-SHA25620171129T123633Z20171129/eu-central-1/s3/aws4_request555016eca303c98732f51adcaaa83eac7368fb75f59eaa9f59116684b9030ee0</StringToSign><SignatureProvided>bf703f56827aa0f04aab3fa6a1e2aa277117344cdbfc3f4f3e51895ce62af826</SignatureProvided><StringToSignBytes>41 57 53 34 2d 48 4d 41 43 2d 53 48 41 32 35 36 0a 32 30 31 37 31 31 32 39 54 31 32 33 36 33 33 5a 0a 32 30 31 37 31 31 32 39 2f 65 75 2d 63 65 6e 74 72 61 6c 2d 31 2f 73 33 2f 61 77 73 34 5f 72 65 71 75 65 73 74 0a 35 35 35 30 31 36 65 63 61 33 30 33 63 39 38 37 33 32 66 35 31... <truncated>
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:175)
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy23.retrieveINode(Unknown Source)
    at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:381)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.immutable.List.flatMap(List.scala:344)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
    at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:415)
    at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:352)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at sparklyr.Invoke$.invoke(invoke.scala:102)
    at sparklyr.StreamHandler$.handleMethodCall(stream.scala:97)
    at sparklyr.StreamHandler$.read(stream.scala:62)
    at sparklyr.BackendHandler.channelRead0(handler.scala:52)
    at sparklyr.BackendHandler.channelRead0(handler.scala:14)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:652)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.jets3t.service.S3ServiceException: Service Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message><AWSAccessKeyId>AKIAJ3ZD2ZEISKNQMSGQ</AWSAccessKeyId><StringToSign>AWS4-HMAC-SHA25620171129T123633Z20171129/eu-central-1/s3/aws4_request555016eca303
qmelpv7a

qmelpv7a1#

这可能是由于区域之间的签名版本不兼容造成的。有些地区不支持签名版本2,您必须使用此处提到的版本4-http://docs.aws.amazon.com/general/latest/gr/signature-version-2.html#signature-2-区域-服务
当涉及到s3 bucket/object访问时,必须在请求中指定区域名称(eu-central-1)。

yfwxisqw

yfwxisqw2#

如果你使用亚马逊自己的电子病历,那么你需要看看他们的文档。
如果您使用的是apache自己的工件,那么需要关闭s3到s3a文件系统连接器,然后启用
设置jvm系统属性 com.amazonaws.services.s3.enableV4true 设置终结点 fs.s3a.endpoint 到您要交谈的特定存储的端点以查看列表。
如果事情不成功,你会得到一个相当普通的“400坏请求”消息,这对于找出事情出错的原因并没有多大帮助。从使用s3useast(s3a://landsatpds)中的bucket开始,这是一个可以尝试列出的bucket,然后移动到v4区域中的bucket。

w6mmgewl

w6mmgewl3#

您需要通过设置aws凭据

Sys.setenv(AWS_ACCESS_KEY_ID="[Your access key]")
Sys.setenv(AWS_SECRET_ACCESS_KEY="[Your secret access key]")

看到了吗 ?spark_read_csvconfig$sparklyr.defaultPackages <- "org.apache.hadoop:hadoop-aws:2.7.3" 是必不可少的。

相关问题