从ec2上的s3读取数据时出错:java.lang.classnotfoundexception:class org.apache.hadoop.fs.s3a.s3afilesystem not found

tv6aics1  于 2021-07-13  发布在  Spark
关注(0)|答案(1)|浏览(721)

我正试着从一个s3桶里读出来

data = spark.read.parquet("s3a://my-bucket/data")

但我得到了这样一个错误:

jjhzyzn0

jjhzyzn01#

-------->1 stores=spark.read.parquet(stores\u path)中的py4jjavaerror traceback(最近一次调用last)
在parquet中~/.local/lib/python3.6/site-packages/pyspark/sql/readwriter.py(self,*paths,**options)351 self.\u set \u opts(mergeschema=mergeschema,pathglobfilter=pathglobfilter,352 recursivefilelookup=recursivefilelookup)-->353返回self.\u df(self.\u jreader.parquet(\u to seq(seq(self.\u spark.\u sc,paths)))354 355@忽略unicode\u前缀
~/.local/lib/python3.6/site-packages/py4j/java\u gateway.py in call(self,*args)1303 answer=self.gateway\u client.send\u command(command)1304 return\u value=get\u return\u value(->1305 answer,self.gateway\u client,self.target\u id,self.name)1306 1307用于temp\u args中的temp\u arg:
~/.local/lib/python3.6/site-packages/pyspark/sql/utils.py in deco(*a,**kw)126 def deco(*a,**kw):127 try:-->128返回f(*a,**kw)129,py4j.protocol.py4jjavaerror as e:130 converted=convert\u exception(e.java\u exception)
get\u return\u value(answer,gateway\u client,target\u id,name)326中的~/.local/lib/python3.6/site-packages/py4j/protocol.py引发py4jjavaerror(327“调用{0}{1}{2}时发生错误。\n”。-->328格式(target\u id,“.”,name),值)329 else:330 raise py4jerror(
py4jjavaerror:调用o38.parquet时出错:java.lang.runtimeexception:java.lang.classnotfoundexception:class org.apache.hadoop.fs.s3a.s3afilesystem未在org.apache.hadoop.conf.configuration.getclass(configuration)中找到。java:2197)在org.apache.hadoop.fs.filesystem.getfilesystemclass(filesystem。java:2654)在org.apache.hadoop.fs.filesystem.createfilesystem(filesystem。java:2667)在org.apache.hadoop.fs.filesystem.access$200(文件系统)。java:94)在org.apache.hadoop.fs.filesystem$cache.getinternal(filesystem。java:2703)在org.apache.hadoop.fs.filesystem$cache.get(filesystem。java:2685)在org.apache.hadoop.fs.filesystem.get(filesystem。java:373)在org.apache.hadoop.fs.path.getfilesystem(path。java:295)在org.apache.spark.sql.execution.streaming.filestreamsink$.hasmetadata(filestreamsink)。scala:46)位于org.apache.spark.sql.execution.datasources.datasource.resolvererelation(datasource。scala:366)位于org.apache.spark.sql.dataframereader.loadv1source(dataframereader。scala:297)在org.apache.spark.sql.dataframereader.$anonfun$加载$2(dataframereader。scala:286)在scala.option.getorelse(option。scala:189)位于org.apache.spark.sql.dataframereader.load(dataframereader。scala:286)位于org.apache.spark.sql.dataframereader.parquet(dataframereader。scala:755)位于sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl。java:62)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(方法。java:498)在py4j.reflection.methodinvoker.invoke(methodinvoker。java:244)在py4j.reflection.reflectionengine.invoke(reflectionengine。java:357)在py4j.gateway.invoke(gateway。java:282)在py4j.commands.abstractcommand.invokemethod(abstractcommand。java:132)在py4j.commands.callcommand.execute(callcommand。java:79)在py4j.gatewayconnection.run(网关连接。java:238)在java.lang.thread.run(线程。java:748)原因:java.lang.classnotfoundexception:在org.apache.hadoop.conf.configuration.getclassbyname(配置)中找不到类org.apache.hadoop.fs.s3a.s3afilesystem。java:2101)在org.apache.hadoop.conf.configuration.getclass(configuration。java:2195) ... 25个以上
我解决不了这个问题。有人能帮忙吗?使用pyspark=='3.0.1'awscli='1.19.9'

相关问题