如何使用包含复杂结构数据类型的数据的Parquet文件创建外部配置单元表

bcs8qyzn  于 2021-06-26  发布在  Hive
关注(0)|答案(0)|浏览(260)

我有一组Parquet文件,其中包含一个名为people的表的数据。现在,Parquet文件中的这些数据由复杂的数据类型组成,如结构等。Parquet文件中的数据模式已附加在模式下:

  1. |-- distinct_id: string (nullable = true)
  2. |-- android_app_version: string (nullable = true)
  3. |-- android_app_version_code: string (nullable = true)
  4. |-- android_brand: string (nullable = true)
  5. |-- android_devices: array (nullable = true)
  6. | |-- element: string (containsNull = true)
  7. |-- android_lib_version: string (nullable = true)
  8. |-- android_manufacturer: string (nullable = true)
  9. |-- android_os: string (nullable = true)
  10. |-- android_os_version: string (nullable = true)
  11. |-- android_push_error: string (nullable = true)
  12. |-- browser: string (nullable = true)
  13. |-- browser_version: double (nullable = true)
  14. |-- campaigns: array (nullable = true)
  15. | |-- element: long (containsNull = true)
  16. |-- country_code: string (nullable = true)
  17. |-- deliveries: array (nullable = true)
  18. | |-- element: long (containsNull = true)
  19. |-- initial_referrer: string (nullable = true)
  20. |-- initial_referring_domain: string (nullable = true)
  21. |-- ios_app_release: string (nullable = true)
  22. |-- ios_app_version: string (nullable = true)
  23. |-- ios_device_model: string (nullable = true)
  24. |-- ios_devices: array (nullable = true)
  25. | |-- element: string (containsNull = true)
  26. |-- ios_lib_version: string (nullable = true)
  27. |-- ios_version: string (nullable = true)
  28. |-- last_seen: string (nullable = true)
  29. |-- notifications: array (nullable = true)
  30. | |-- element: struct (containsNull = true)
  31. | | |-- $time: string (nullable = true)
  32. | | |-- campaign_id: long (nullable = true)
  33. | | |-- message_id: long (nullable = true)
  34. | | |-- message_subtype: string (nullable = true)
  35. | | |-- message_type: string (nullable = true)
  36. | | |-- time: string (nullable = true)
  37. | | |-- type: string (nullable = true)
  38. |-- os: string (nullable = true)
  39. |-- predict_grade: string (nullable = true)
  40. |-- region: string (nullable = true)
  41. |-- swift_lib_version: string (nullable = true)
  42. |-- timezone: string (nullable = true)
  43. |-- area: string (nullable = true)
  44. |-- country: string (nullable = true)
  45. |-- dob: string (nullable = true)
  46. |-- date: string (nullable = true)
  47. |-- default_languages: string (nullable = true)
  48. |-- email: string (nullable = true)
  49. |-- first_app_launch: string (nullable = true)
  50. |-- first_app_launch_date: string (nullable = true)
  51. |-- first_login: boolean (nullable = true)
  52. |-- gaid: string (nullable = true)
  53. |-- lr_age: string (nullable = true)
  54. |-- lr_birthdate: string (nullable = true)
  55. |-- lr_country: string (nullable = true)
  56. |-- lr_gender: string (nullable = true)
  57. |-- language: array (nullable = true)
  58. | |-- element: string (containsNull = true)
  59. |-- languages: string (nullable = true)
  60. |-- languages_disabled: string (nullable = true)
  61. |-- languages_selected: string (nullable = true)
  62. |-- launched: string (nullable = true)
  63. |-- location: string (nullable = true)
  64. |-- media_id: string (nullable = true)
  65. |-- no_of_logins: long (nullable = true)
  66. |-- pop-strata: string (nullable = true)
  67. |-- price: string (nullable = true)
  68. |-- random_number: long (nullable = true)
  69. |-- second_name: string (nullable = true)
  70. |-- state: string (nullable = true)
  71. |-- state_as_per_barc: string (nullable = true)
  72. |-- total_app_opens: long (nullable = true)
  73. |-- total_app_sessions: string (nullable = true)
  74. |-- total_sessions: string (nullable = true)
  75. |-- town: string (nullable = true)
  76. |-- user_type: string (nullable = true)
  77. |-- userid: string (nullable = true)
  78. |-- appversion: string (nullable = true)
  79. |-- birthdate: string (nullable = true)
  80. |-- campaign: string (nullable = true)
  81. |-- city: string (nullable = true)
  82. |-- media_source: string (nullable = true)
  83. |-- last_name: string (nullable = true)
  84. |-- first_name: string (nullable = true)
  85. |-- ios_ifa: string (nullable = true)
  86. |-- android_model: string (nullable = true)
  87. |-- age: string (nullable = true)
  88. |-- uid: string (nullable = true)

我想要的是最终创建一个hiveext表,指向parquet文件中的数据。一种解决方案可以是扁平化或使用sqlexplode将结构多样化为单个的列数据,但我最终得到了所有最初属于struct数据类型的列的空值。Parquet文件位于azure blob位置。
我尝试在sparksql的Dataframe中加载Parquet文件,但它为具有复杂数据类型的列提供空值:

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题