如何使用spark从嵌套的json数组中获取值?

sauutmhj  于 2021-07-13  发布在  Spark
关注(0)|答案(1)|浏览(471)

我有这个阵列

  1. val myJson = {
  2. "record": {
  3. "recordId": 100,
  4. "name": "xyz",
  5. "version": "1.1",
  6. "input": [
  7. {
  8. "format": "Database",
  9. "type": "Oracle",
  10. "connectionStringId": "212",
  11. "connectionString": "ksfksfklsdflk",
  12. "schemaName": "schema1",
  13. "databaseName": "db1",
  14. "tables": [
  15. {
  16. "table_name":"one"
  17. }
  18. {
  19. "table_name":"two"
  20. }
  21. ]
  22. }
  23. ]
  24. }
  25. }

我使用这段代码在dataframe中获取这个json

  1. val df = sparkSession.read.json(myjson)

我想要schemaname和databasename的值,我怎样才能得到它们?

  1. val schemaName = df.select("record.input.schemaName") //not working

有人,请帮帮我

ryoqjall

ryoqjall1#

需要分解数组列 record.input 然后选择所需的字段:

  1. df.select(explode(col("record.input")).as("inputs"))
  2. .select("inputs.schemaName", "inputs.databaseName")
  3. .show
  4. //+----------+------------+
  5. //|schemaName|databaseName|
  6. //+----------+------------+
  7. //| schema1| db1|
  8. //+----------+------------+

相关问题