java.lang.classnotfoundexception:com.johnsnowlabs.nlp.documentassembler spark in pycharm with conda env

dgiusagp  于 2021-07-13  发布在  Spark
关注(0)|答案(1)|浏览(362)

我从spark nlp中保存了一个预先训练过的模型,然后我正在尝试使用anaconda env在pycharm中运行python脚本:

Model_path = "./xxx"
model = PipelineModel.load(Model_path)

但是我得到了以下错误:(我尝试了pyspark 2.4.4&spark-nlp2.4.4,pyspark 2.4.4&spark-nlp2.5.4)得到了相同的错误:

21/02/05 13:31:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Traceback (most recent call last):
  File "C:/Users/xxxx/xxxxx.py", line 381, in <module>
    model = PipelineModel.load(Model_path)
  File "C:\Users\xxxxxxxx\anaconda3\envs\python3.7\lib\site-packages\pyspark\ml\util.py", line 362, in load
    return cls.read().load(path)
  File "C:\Users\\xxxxxxxx\anaconda3\envs\python3.7\lib\site-packages\pyspark\ml\pipeline.py", line 242, in load
    return JavaMLReader(self.cls).load(path)
  File "C:\Users\xxxxxxxx\anaconda3\envs\python3.7\lib\site-packages\pyspark\ml\util.py", line 300, in load
    java_obj = self._jread.load(path)
  File "C:\Users\xxxxxxxx\anaconda3\envs\python3.7\lib\site-packages\py4j\java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "C:\Users\xxxxxxxx\anaconda3\envs\python3.7\lib\site-packages\pyspark\sql\utils.py", line 63, in deco
    return f(*a,**kw)
  File "C:\Users\xxxxxxxx\anaconda3\envs\python3.7\lib\site-packages\py4j\protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o314.load.
: java.lang.ClassNotFoundException: com.johnsnowlabs.nlp.DocumentAssembler

我是pyspark和spark nlp的新手,有人能帮忙吗?

ddarikpa

ddarikpa1#

先来点背景。spark nlp库依赖于需要存在于spark类路径中的jar文件。根据在pyspark中如何启动上下文,有三种方法可以提供这个jar。a) 通过解释器启动python应用程序时,调用sparknlp.start(),jar将自动下载。
b) 使用--jars开关将jar传递给pyspark命令。在本例中,您从releases页面获取jar并手动下载它。
c) 启动pyspark并传递——包,这里需要传递一个maven坐标,例如,

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.7.5

请检查这里的文件,
https://github.com/johnsnowlabs/spark-nlp#usage
确保你选择了你想要的版本。

相关问题