“pyspark”在安装“hadoop”后不工作，但“spark-shell”仍然工作,为什么？

ne5o7dgx 于 2022-11-25 发布在 Apache

关注(0)|答案(2)|浏览(227)

我已经安装了Spark 3.3.1，并且它以前是用spark-shell和pyspark命令运行的。但是在我安装了Hadoop 3.3.1之后，似乎pyspark命令不能正常工作，这是运行该命令的结果：

C:\Users\A>pyspark2 --num-executors 4 --executor-memory 1g
[I 2022-11-20 22:36:09.100 LabApp] JupyterLab extension loaded from C:\Users\A\AppData\Local\Programs\Python\Python311\Lib\site-packages\jupyterlab
[I 2022-11-20 22:36:09.100 LabApp] JupyterLab application directory is C:\Users\A\AppData\Local\Programs\Python\Python311\share\jupyter\lab
[I 22:36:09.107 NotebookApp] Serving notebooks from local directory: C:\Users\A
[I 22:36:09.107 NotebookApp] Jupyter Notebook 6.5.2 is running at:
[I 22:36:09.107 NotebookApp] http://localhost:8888/?token=0fca9f0378976c7af19886970c9e801ac27a8d1a209528db
[I 22:36:09.108 NotebookApp]  or http://127.0.0.1:8888/?token=0fca9f0378976c7af19886970c9e801ac27a8d1a209528db
[I 22:36:09.108 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 22:36:09.189 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///C:/Users/A/AppData/Roaming/jupyter/runtime/nbserver-8328-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=0fca9f0378976c7af19886970c9e801ac27a8d1a209528db
     or http://127.0.0.1:8888/?token=0fca9f0378976c7af19886970c9e801ac27a8d1a209528db
0.01s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.

它打开了Jupyter notebook，但是Spark标志没有显示，Python shell也不能像以前那样在CMD中使用。但是spark-shell仍然可以如下工作：

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://168.150.8.52:4040
Spark context available as 'sc' (master = local[*], app id = local-1669062477403).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.1
      /_/

Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.16.1)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 22/11/21 12:28:12 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped

scala>

apache-spark

来源：https://stackoverflow.com/questions/74515230/pyspark-doesnt-work-after-installing-hadoop-but-spark-shell-still-works

2条答案

按热度按时间

1rhkuytd1#

您的路径已更改为使用sparks python发行版。您可以了解更多关于此here的信息。
尝试：echo $PATH
然后看看你有多少条Python，我打赌你不止一条。

赞(0）回复(0）举报 2022-11-25

von4xj4u2#

它打开Jupyter笔记本，但没有显示Spark徽标，Python shell也不可用
Jupyter是一个Python shell（默认情况下）。
Spark没有附带pyspark2命令，所以看起来你已经对你的环境做了一些自定义。而且，如果你设置了一个特定的环境变量，它只会默认打开Jupyter。
徽标不一定能告诉您它正在工作。请尝试创建会话

from pyspark.sql import SparkSession 
spark = SparkSession.appName("test").getOrCreate()

赞(0）回复(0）举报 2022-11-25

我来回答

“pyspark”在安装“hadoop”后不工作，但“spark-shell”仍然工作,为什么？

2条答案

相关问题

热门标签

最新问答