如何在配置单元中导入复杂的json数据

x3naxklr  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(374)

在输入中,我有一个要在配置单元上导入的json文件:

  1. [
  2. {
  3. "code": "ACPBC3P",
  4. "libelle": "Bon de commande Prime de satisfaction ACP",
  5. "libelleCourt": "Bon de commande Prime de satisfaction ACP",
  6. "libelleLong": "Bon de commande Prime de satisfaction ACP",
  7. "dureeStockage": 24,
  8. "dureeArchivage": 96,
  9. "dureeEpuration": 120,
  10. "dureeStockageReelle": 24,
  11. "dureeArchivageReelle": 96,
  12. "dureeEpurationReelle": 120,
  13. "typologie": {
  14. "code": "ACP",
  15. "libelle": "ACP - Activ'projet"
  16. },
  17. "sousTypologie": {
  18. "code": "ACPBC3P",
  19. "libelle": "BC3P - Bon de commande Prime de satisfaction"
  20. }
  21. },
  22. {
  23. "code": "ACPC1",
  24. "libelle": "C1 - Demande d'avoir",
  25. "libelleCourt": "C1 - Demande d'avoir",
  26. "libelleLong": "C1 - Demande d'avoir",
  27. "dureeStockage": 36,
  28. "dureeArchivage": 84,
  29. "dureeEpuration": 120,
  30. "dureeStockageReelle": 36,
  31. "dureeArchivageReelle": 84,
  32. "dureeEpurationReelle": 120,
  33. "typologie": {
  34. "code": "ACP",
  35. "libelle": "ACP - Activ'projet"
  36. },
  37. "sousTypologie": {
  38. "code": "ACPC1",
  39. "libelle": "C1 - Demande d'avoir"
  40. }
  41. },
  42. {
  43. "code": "ACPC2",
  44. "libelle": "C2 - Relance fournisseur",
  45. "libelleCourt": "C2 - Relance fournisseur",
  46. "libelleLong": "C2 - Relance fournisseur",
  47. "dureeStockage": 36,
  48. "dureeArchivage": 84,
  49. "dureeEpuration": 120,
  50. "dureeStockageReelle": 36,
  51. "dureeArchivageReelle": 84,
  52. "dureeEpurationReelle": 120,
  53. "typologie": {
  54. "code": "ACP",
  55. "libelle": "ACP - Activ'projet"
  56. },

我试图用这种复杂的类型来捕捉这些信息:

  1. ARRAY <STRUCT <`code`: STRING,` libelle`: STRING, `libelleCourt`: STRING,` libelleLong`: STRING, `storage duration`: INT, `Archive duration` INT, `dureeEpuration`: INT,` dureeStockageReelle`: INT, `dureeArchivageReelle`: INT,` dureeEpurationReelle`: INT, `typologie`: STRUCT <` code` STRING, `libelle` STRING>,` sousTypologie`: STRUCT <`code`: STRING,` libelle`: STRING>, `modeCapture`: STRUCT <` code`: STRING, `libelle`: STRING>,` master`: STRING, `codeActivite`: STRING >> but unfortunately it do not work !
  2. ARRAY <STRUCT <`code`: STRING,` libelle`: STRING, `libelleCourt`: STRING,` libelleLong`: STRING, `storage duration`: INT, `Archive duration` INT, `dureeEpuration`: INT,` dureeStockageReelle`: INT, `dureeArchivageReelle`: INT,` dureeEpurationReelle`: INT, `typologie`: STRUCT <` code` STRING, `libelle` STRING>,` sousTypologie`: STRUCT <`code`: STRING,` libelle`: STRING>, `modeCapture`: STRUCT <` code`: STRING, `libelle`: STRING>,` master`: STRING, `codeActivite`: STRING >> but unfortunately it do not work !
8cdiaqws

8cdiaqws1#

你没有提到任何关于错误的事情。一般来说,在使用json-serde时需要注意两件事。
org.apache.hadoop.hive.serde2.jsonserde不支持以方括号“[”开头的json数据
jsonserde基于文本serde,每一个换行符都被视为一个新记录
有效格式:

  1. {"world_rank": "1","country": "China","population": "1388232694","World": "0.185"},
  2. {"world_rank": "2","country": "India","population": "1342512706","World": "0.179"},
  3. {"world_rank": "3","country": "U.S.","population": "326474013","World": "0.043"},
  4. {"world_rank": "4","country": "Indonesia","population": "263510146","World": "0.035"}

格式1无效:

  1. [
  2. {"world_rank": "1","country": "China","population": "1388232694","World": "0.185"},
  3. {"world_rank": "2","country": "India","population": "1342512706","World": "0.179"},
  4. {"world_rank": "3","country": "U.S.","population": "326474013","World": "0.043"},
  5. {"world_rank": "4","country": "Indonesia","population": "263510146","World": "0.035"}
  6. ]

格式2无效:

  1. {
  2. "world_rank": "1",
  3. "country": "China",
  4. "population": "1388232694",
  5. "World": "0.185"
  6. },
  7. {
  8. "world_rank": "2",
  9. "country": "India",
  10. "population": "1342512706",
  11. "World": "0.179"
  12. },
  13. {
  14. "world_rank": "3",
  15. "country": "U.S.",
  16. "population": "326474013",
  17. "World": "0.043"
  18. },
  19. {
  20. "world_rank": "4",
  21. "country": "Indonesia",
  22. "population": "263510146",
  23. "World": "0.035"
  24. }

在将输入数据加载到配置单元表之前,应将其预处理为以下格式

  1. {"code":"ACPBC3P","libelle":"Bon de commande Prime de satisfaction ACP","libelleCourt":"Bon de commande Prime de satisfaction ACP","libelleLong":"Bon de commande Prime de satisfaction ACP","dureeStockage":24,"dureeArchivage":96,"dureeEpuration":120,"dureeStockageReelle":24,"dureeArchivageReelle":96,"dureeEpurationReelle":120,"typologie":{"code":"ACP","libelle":"ACP - Activ'projet"},"sousTypologie":{"code":"ACPBC3P","libelle":"BC3P - Bon de commande Prime de satisfaction"}},
  2. {"code":"ACPC1","libelle":"C1 - Demande d'avoir","libelleCourt":"C1 - Demande d'avoir","libelleLong":"C1 - Demande d'avoir","dureeStockage":36,"dureeArchivage":84,"dureeEpuration":120,"dureeStockageReelle":36,"dureeArchivageReelle":84,"dureeEpurationReelle":120,"typologie":{"code":"ACP","libelle":"ACP - Activ'projet"},"sousTypologie":{"code":"ACPC1","libelle":"C1 - Demande d'avoir"}}
  3. {"code":"ACPC2","libelle":"C2 - Relance fournisseur","libelleCourt":"C2 - Relance fournisseur","libelleLong":"C2 - Relance fournisseur","dureeStockage":36,"dureeArchivage":84,"dureeEpuration":120,"dureeStockageReelle":36,"dureeArchivageReelle":84,"dureeEpurationReelle":120,"typologie":{"code":"ACP","libelle":"ACP - Activ'projet"}}

ddl地址:

  1. CREATE TABLE data (
  2. code STRING,
  3. libelle STRING,
  4. libelleCourt STRING,
  5. libelleLong STRING,
  6. dureeStockage INT,
  7. dureeArchivage INT,
  8. dureeEpuration INT,
  9. dureeStockageReelle INT,
  10. dureeArchivageReelle INT,
  11. dureeEpurationReelle INT,
  12. typologie struct<code: STRING, libelle: STRING>,
  13. sousTypologie struct<code: STRING, libelle: STRING>
  14. )
  15. ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
  16. STORED AS TEXTFILE;

选择数据的查询:

  1. select soustypologie.code from data;
  2. select typologie.libelle from data;
展开查看全部

相关问题