spark json模式元数据可以Map到配置单元吗?

aiazj4mn  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(455)

在使用apachespark时,我们可以轻松地生成一个json文件来描述Dataframe结构。此Dataframe结构如下所示:

  1. {
  2. "type": "struct",
  3. "fields": [
  4. {
  5. "name": "employee_name",
  6. "type": "string",
  7. "nullable": true,
  8. "metadata": {
  9. "comment": "employee name",
  10. "system_name": "hr system",
  11. "business_key": true,
  12. "private_info": true
  13. }
  14. },
  15. {
  16. "name": "employee_job",
  17. "type": "string",
  18. "nullable": true,
  19. "metadata": {
  20. "comment": "employee job description",
  21. "system_name": "sap",
  22. "business_key": false,
  23. "private_info": false
  24. }
  25. }
  26. ]
  27. }

当在配置单元中存储此信息或从配置单元获取Dataframe时,spark将把配置单元元数据列中的“comments”Map到元数据中的“comment”属性。但是,如何将json中的dataframe定义Map到配置单元表中呢?是否可以将其他标记存储到business\u key或private\u info flag之类的列中?
谢谢

q7solyqu

q7solyqu1#

是的,可以存储额外的元数据。创建spark兼容的配置单元表并在中添加所需的元数据 TBLPROPERTIES 就像下面一样。
Hive表

  1. CREATE TABLE `employee_details`(
  2. `employee_name` string COMMENT 'employee name',
  3. `employee_job` string COMMENT 'employee job description')
  4. STORED AS ORC
  5. TBLPROPERTIES (
  6. 'spark.sql.sources.provider'='orc',
  7. 'spark.sql.sources.schema.numParts'='1',
  8. 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"employee_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"comment\":\"employee name\",\"business_key\":true,\"system_name\":\"hr system\",\"private_info\":true}},{\"name\":\"employee_job\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"comment\":\"employee job description\",\"business_key\":false,\"system_name\":\"sap\",\"private_info\":false}}]}'
  9. )

从spark访问表

  1. scala> val df = spark.table("hivedb.employee_details")
  2. adf: org.apache.spark.sql.DataFrame = [employee_name: string, employee_job: string]
  3. scala> df.schema.prettyJson
  4. res12: String =
  5. {
  6. "type" : "struct",
  7. "fields" : [ {
  8. "name" : "employee_name",
  9. "type" : "string",
  10. "nullable" : true,
  11. "metadata" : {
  12. "comment" : "employee name",
  13. "business_key" : true,
  14. "system_name" : "hr system",
  15. "private_info" : true
  16. }
  17. }, {
  18. "name" : "employee_job",
  19. "type" : "string",
  20. "nullable" : true,
  21. "metadata" : {
  22. "comment" : "employee job description",
  23. "business_key" : false,
  24. "system_name" : "sap",
  25. "private_info" : false
  26. }
  27. } ]
  28. }
展开查看全部

相关问题