- 此问题在此处已有答案**:
How to access local files in Spark on Windows?(5个答案)
2小时前关门了。
我是Spark和PySpark的新手,我从here(多个. csv文件的1,75GB压缩)下载数据,并将它们存储在D盘,与C盘上的Spark安装和PySpark脚本分开。
当我尝试阅读它们时,我出现了以下错误:
---------------------------------------------------------------------------
AnalysisException Traceback (most recent call last)
Cell In[12], line 3
1 df = spark.read.option("header", True) \
2 .option("inferSchema", True) \
----> 3 .csv("\airport_delay")
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyspark\sql\readwriter.py:535, in DataFrameReader.csv(self, path, schema, sep, encoding, quote, escape, comment, header, inferSchema, ignoreLeadingWhiteSpace, ignoreTrailingWhiteSpace, nullValue, nanValue, positiveInf, negativeInf, dateFormat, timestampFormat, maxColumns, maxCharsPerColumn, maxMalformedLogPerPartition, mode, columnNameOfCorruptRecord, multiLine, charToEscapeQuoteEscaping, samplingRatio, enforceSchema, emptyValue, locale, lineSep, pathGlobFilter, recursiveFileLookup, modifiedBefore, modifiedAfter, unescapedQuoteHandling)
533 if type(path) == list:
534 assert self._spark._sc._jvm is not None
--> 535 return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
536 elif isinstance(path, RDD):
538 def func(iterator):
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\py4j\java_gateway.py:1321, in JavaMember.__call__(self, *args)
1315 command = proto.CALL_COMMAND_NAME +\
1316 self.command_header +\
1317 args_command +\
1318 proto.END_COMMAND_PART
1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
1322 answer, self.gateway_client, self.target_id, self.name)
1324 for temp_arg in temp_args:
1325 temp_arg._detach()
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyspark\sql\utils.py:196, in capture_sql_exception.<locals>.deco(*a, **kw)
192 converted = convert_exception(e.java_exception)
193 if not isinstance(converted, UnknownException):
194 # Hide where the exception came from that shows a non-Pythonic
195 # JVM exception message.
--> 196 raise converted from None
197 else:
198 raise
AnalysisException: Path does not exist: file:/C:/Users/Travail/Documents/PySpark/irport_delay
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.master("local[1]") \
.appName("Test1") \
.getOrCreate()
df = spark.read.option("header", True) \
.option("inferSchema", True) \
.csv("file:\\\D:\Dataset\airport_delay")
我怎么能用PySpark从另一个磁盘读取数据呢?还是这样做是无稽之谈?
我尝试了:-添加/删除"file:"-读取Spark配置文档并查找类似于"spark. sql. warehouse. dir"的内容
1条答案
按热度按时间u2nhd7ah1#
我试着把所有的“\”都改成“/”,结果成功了。