spark nlp:documentassembler初始化失败,返回'java.lang.noclassdeffounderror:org/apache/spark/ml/util/mlwritable$class'

odopli94  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(881)

我正在试用中提供的contenxtawarespellcheckerhttps://medium.com/spark-nlp/applying-context-aware-spell-checking-in-spark-nlp-3c29c46963bc
管道中的第一个组件是文档组装器

  1. from sparknlp.annotator import *
  2. from sparknlp.base import *
  3. import sparknlp
  4. spark = sparknlp.start()
  5. documentAssembler = DocumentAssembler()\
  6. .setInputCol("text")\
  7. .setOutputCol("document")

运行失败时的上述代码如下

  1. Traceback (most recent call last):
  2. File "<stdin>", line 1, in <module>
  3. File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\__init__.py", line 110, in wrapper
  4. return func(self,**kwargs)
  5. File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\sparknlp\base.py", line 148, in __init__
  6. super(DocumentAssembler, self).__init__(classname="com.johnsnowlabs.nlp.DocumentAssembler")
  7. File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\__init__.py", line 110, in wrapper
  8. return func(self,**kwargs)
  9. File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\sparknlp\internal.py", line 72, in __init__
  10. self._java_obj = self._new_java_obj(classname, self.uid)
  11. File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\ml\wrapper.py", line 69, in _new_java_obj
  12. return java_obj(*java_args)
  13. File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py", line 1569, in __call__
  14. File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\sql\utils.py", line 131, in deco
  15. return f(*a,**kw)
  16. File "C:\Users\pab\AppData\Local\Continuum\anaconda3.7\envs\MailChecker\lib\site-packages\pyspark\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py", line 328, in get_return_value
  17. py4j.protocol.Py4JJavaError: An error occurred while calling None.com.johnsnowlabs.nlp.DocumentAssembler.
  18. : java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class
  19. at com.johnsnowlabs.nlp.DocumentAssembler.<init>(DocumentAssembler.scala:16)
  20. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  21. at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  22. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  23. at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  24. at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
  25. at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
  26. at py4j.Gateway.invoke(Gateway.java:238)
  27. at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
  28. at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
  29. at py4j.GatewayConnection.run(GatewayConnection.java:238)
  30. at java.lang.Thread.run(Thread.java:748)

编辑:apachespark版本是2.4.6

pb3skfrl

pb3skfrl1#

我在从spark2.45升级到spark3+时遇到过这个问题(不过在scala的databricks上)。试着降低你的Spark版本。

相关问题