pyspark 如何将JSON字符串转换为字符串?

62o28rlo  于 2024-01-06  发布在  Spark
关注(0)|答案(1)|浏览(185)

我正在尝试将JSON字符串转换为字符串。我不能这样做。

  1. import findspark
  2. findspark.init()
  3. from pyspark.sql import SparkSession
  4. spark = SparkSession.builder \
  5. .appName("MyApp") \
  6. .getOrCreate()
  7. sc=spark.sparkContext
  8. some_json_string = """
  9. [
  10. {"id":1, "name":"test1"},
  11. {"id":2, "name":"test2"}
  12. {"id":3, "name":"test3"}
  13. ]
  14. """
  15. df = spark.read.option("multiLine","true").json(sc.parallelize(some_json_string))
  16. df.printSchema()
  17. df.show()

字符串
我得到的错误

  1. root
  2. |-- _corrupt_record: string (nullable = true)
  3. +---------------+
  4. |_corrupt_record|
  5. +---------------+
  6. | [|
  7. | {|
  8. | "|
  9. | i|
  10. | d|
  11. | "|
  12. | :|
  13. | 1|
  14. | ,|
  15. | "|
  16. | n|
  17. | a|
  18. | m|
  19. | e|
  20. | "|
  21. | :|
  22. | "|
  23. | t|
  24. | e|
  25. | s|
  26. +---------------+
  27. only showing top 20 rows


如何将其转换为PySpark框架?
我也试过:


的数据

weylhg0b

weylhg0b1#

基于this question,您必须将JSON字符串作为列表传递,如下所示:df = spark.read.json(sc.parallelize([newJson]))
你的新代码应该是:

  1. import findspark
  2. findspark.init()
  3. from pyspark.sql import SparkSession
  4. spark = SparkSession.builder \
  5. .appName("MyApp") \
  6. .getOrCreate()
  7. sc=spark.sparkContext
  8. some_json_string = """
  9. [
  10. {"id":1, "name":"test1"},
  11. {"id":2, "name":"test2"}
  12. {"id":3, "name":"test3"}
  13. ]
  14. """
  15. df = spark.read.option("multiLine","true").json(sc.parallelize([some_json_string]))
  16. df.printSchema()
  17. df.show()

字符串

展开查看全部

相关问题