我正在尝试使用pyspark连接mongodb并读取一些数据
from pyspark.sql import SparkSession
my_spark = SparkSession \
.builder \
.appName("MyApp") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/my-db.my-coll") \
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/my-db.my-coll") \
.config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.12:2.4.0") \
.getOrCreate()
df = my_spark.read.format("mongo").load()
df.printSchema()
但我得到以下错误
An error occurred while calling o42.load.
: java.lang.NoClassDefFoundError: org/bson/codecs/JsonObjectCodecProvider
spark版本是3.1.1
我指的是这些资源:
https://docs.mongodb.com/spark-connector/current/python-api/
https://www.mongodb.com/blog/post/getting-started-with-mongodb-pyspark-and-jupyter-notebook
暂无答案!
目前还没有任何答案,快来回答吧!