Apache Spark 安装光栅帧(数据块)时出现Java错误

d7v8vwbk  于 2023-02-05  发布在  Apache
关注(0)|答案(2)|浏览(112)

我已经按照steps in this notebook在我的数据块集群上安装了光栅帧。
最后,我可以导入以下内容:

from pyrasterframes import rf_ipython
from pyrasterframes.utils import create_rf_spark_session
from pyspark.sql.functions import lit 
from pyrasterframes.rasterfunctions import *

但是当我跑的时候:

spark = create_rf_spark_session()

出现以下错误:"java. lang.无类定义发现错误:scala/产品$class ".
我使用的是Spark 3.2.1集群,我还安装了Java运行时环境1.8.0_341,但这没有什么区别。

    • 有人能解释一下哪里出错了吗?以及如何解决这个错误?**

完整错误日志:

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<command-2354681519525034> in <module>
      5 
      6 # Use the provided convenience function to create a basic local SparkContext
----> 7 spark = create_rf_spark_session()
 
/databricks/python/lib/python3.8/site-packages/pyrasterframes/utils.py in create_rf_spark_session(master, **kwargs)
     97 
     98     try:
---> 99         spark.withRasterFrames()
    100         return spark
    101     except TypeError as te:
 
/databricks/python/lib/python3.8/site-packages/pyrasterframes/__init__.py in _rf_init(spark_session)
     42     """ Adds RasterFrames functionality to PySpark session."""
     43     if not hasattr(spark_session, "rasterframes"):
---> 44         spark_session.rasterframes = RFContext(spark_session)
     45         spark_session.sparkContext._rf_context = spark_session.rasterframes
     46 
 
/databricks/python/lib/python3.8/site-packages/pyrasterframes/rf_context.py in __init__(self, spark_session)
     37         self._jvm = self._gateway.jvm
     38         jsess = self._spark_session._jsparkSession
---> 39         self._jrfctx = self._jvm.org.locationtech.rasterframes.py.PyRFContext(jsess)
     40 
     41     def list_to_seq(self, py_list):
 
/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1566 
   1567         answer = self._gateway_client.send_command(command)
-> 1568         return_value = get_return_value(
   1569             answer, self._gateway_client, None, self._fqn)
   1570 
 
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    115     def deco(*a, **kw):
    116         try:
--> 117             return f(*a, **kw)
    118         except py4j.protocol.Py4JJavaError as e:
    119             converted = convert_exception(e.java_exception)
 
/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325             if answer[1] == REFERENCE_TYPE:
--> 326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
    328                     format(target_id, ".", name), value)
 
Py4JJavaError: An error occurred while calling None.org.locationtech.rasterframes.py.PyRFContext.
: java.lang.NoClassDefFoundError: scala/Product$class
    at org.locationtech.rasterframes.model.TileDimensions.<init>(TileDimensions.scala:35)
    at org.locationtech.rasterframes.package$.<init>(rasterframes.scala:55)
    at org.locationtech.rasterframes.package$.<clinit>(rasterframes.scala)
    at org.locationtech.rasterframes.py.PyRFContext.<init>(PyRFContext.scala:49)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:250)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: scala.Product$class
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
    at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
    ... 15 more

先谢谢你?

6ioyuze2

6ioyuze21#

那个版本的光栅帧(0.8.4)只能在使用Spark 2.4和Scala 2.11的DBR6.x上运行,并且不能在使用Scala 2.12的Spark 3.2.x上运行。您可以尝试使用升级到Spark 3.1.2的版本0.10.1,但是它可能不能在Spark 3.2上运行(我还没有测试过)。
如果你正在寻找在Databricks上执行地理空间查询,你可以看看Databricks实验室的Mosaic项目-它支持标准的st_函数和许多其他东西。你可以在下面的blog post中找到公告,更多信息在talk at Data & AI Summit 2022documentationproject on GitHub中。

ozxc1zmp

ozxc1zmp2#

我设法得到了0.10.x版本的光栅帧与Databricks运行时版本9.1 LTS一起工作。在写这篇文章的时候,你不能升级到更高版本的运行时,因为pyspark版本的差异。下面你会找到一个关于如何让它工作的分步指南:

  • 集群应该是单用户,否则你会得到这个错误:
py4j.security.Py4JSecurityException: Constructor public org.apache.spark.SparkConf(boolean) is not whitelisted
  • 在撰写本文时,Databricks运行时版本需要为9.1 LTS。
  • init脚本应安装GDAL:
pip install gdal -f https://girder.github.io/large_image_wheels
  • 栅格框架JAR应从源代码构建:
git clone https://github.com/mjohns-databricks/rasterframes.git
cd rasterframes
sbt publishLocal
  • 栅格框架JAR应上载到数据库。构建后,文件位于:
/pyrasterframes/target/scala-2.12/pyrasterframes-assembly-0.10.1-SNAPSHOT.jar

相关问题