mongodb-spark-无法将字符串强制转换为null类型(value:bsonstring{value='4492148'})

fdbelqdn  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(474)

我正在尝试从mongodb提取数据到我的df。使用java spark
下面是示例代码:-

SparkConf conf = new SparkConf()
        .setAppName("MongoSparkConnectorTour")
        .setMaster("local")
        .set("spark.app.id", "MongoSparkConnectorTour")
        .set("spark.mongodb.input.uri", uri)
        .set("sampleSize", args[2])
        .set("spark.mongodb.output.uri", uri)
        .set("spark.mongodb.input.partitioner", "MongoPaginateByCountPartitioner")
        .set("spark.mongodb.input.partitionerOptions.numberOfPartitions", "64")

JavaSparkContext jsc = new JavaSparkContext(conf)

DataFrame df = MongoSpark.load(jsc).toDF();

System.out.println("DF Count - " + df.count());        
df.printSchema();

有2个表,1个表我可以毫无问题地获取数据,但是对于另一个表,我会遇到以下问题-

20/07/15 14:17:31 ERROR Executor: Exception in task 1.0 in stage 3.0 (TID 4)
com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast STRING into a NullType (value: BsonString{value='4492148'})
        at com.mongodb.spark.sql.MapFunctions$.com$mongodb$spark$sql$MapFunctions$$convertToDataType(MapFunctions.scala:80)
        at com.mongodb.spark.sql.MapFunctions$$anonfun$3.apply(MapFunctions.scala:38)
        at com.mongodb.spark.sql.MapFunctions$$anonfun$3.apply(MapFunctions.scala:36)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
        at com.mongodb.spark.sql.MapFunctions$.documentToRow(MapFunctions.scala:36)
        at com.mongodb.spark.sql.MapFunctions$.castToStructType(MapFunctions.scala:109)
        at com.mongodb.spark.sql.MapFunctions$.com$mongodb$spark$sql$MapFunctions$$convertToDataType(MapFunctions.scala:75)
        at com.mongodb.spark.sql.MapFunctions$$anonfun$3.apply(MapFunctions.scala:38)
        at com.mongodb.spark.sql.MapFunctions$$anonfun$3.apply(MapFunctions.scala:36)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
        at com.mongodb.spark.sql.MapFunctions$.documentToRow(MapFunctions.scala:36)
        at com.mongodb.spark.sql.MapFunctions$.castToStructType(MapFunctions.scala:109)
        at com.mongodb.spark.sql.MapFunctions$.com$mongodb$spark$sql$MapFunctions$$convertToDataType(MapFunctions.scala:75)
        at com.mongodb.spark.sql.MapFunctions$$anonfun$3.apply(MapFunctions.scala:38)

从google我看到的唯一解决方案是增加样本量,但它仍然不起作用。
堆栈溢出导致的强制转换失败增加样本大小
第二个表的体积要大一点,我尝试了更大的样本量,但还是失败了。
解决这个问题的任何其他建议或想法都是有用的。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题