Hive用结构数组爆炸

eni9jsuy  于 2021-06-26  发布在  Hive
关注(0)|答案(2)|浏览(394)

我正在研究如何在Hive中爆炸一个复杂的类型。我有下面的avro文件,我想用于我的测试,并在上面构建了一个配置单元外部表。
这是我的测试数据。

  1. {"order_id":123456,"customer_id":987654,"total":305,"order_details":[{"quantity":5,"total":55,"product_detail":{"product_id":1000,"product_name":"Hugo Boss XY","product_description": {"string": "Hugo Xy Men 100 ml"}, "product_status": "AVAILABLE", "product_category":["fragrance","perfume"],"price":10.35,"product_hash":"XY123"}},{"quantity":5,"total":250,"product_detail":{"product_id":2000,"product_name":"Cherokee Polo T Shirt","product_description": {"string": "Cherokee Medium Blue Polo T Shirt"}, "product_status": "AVAILABLE", "product_category":["T-shirts","V-Neck","Cotton", "Medium"],"price":50.00,"product_hash":"XY789"}}]}
  2. {"order_id":789012,"customer_id":4567324,"total":220,"order_details":[{"quantity":10,"total":120,"product_detail":{"product_id":1001,"product_name":"Hugo Men Red","product_description": {"string": "Hugo Men Red 150 ml"}, "product_status": "ONLY_FEW_LEFT", "product_category":["fragrance","perfume"],"price":12.99,"product_hash":"XY456"}},{"quantity":10,"total":100,"product_detail":{"product_id":2001,"product_name":"Ruggers Smart","product_description": {"string": "Ruggers Smart White Small Polo T Shirt"}, "product_status": "ONLY_FEW_LEFT", "product_category":["T-shirts","Round-Neck","Woolen", "Small"],"price":9.99,"product_hash":"XY987"}}]}

avro模式

  1. {
  2. "namespace":"com.treselle.db.model",
  3. "type":"record",
  4. "doc":"This Schema describes about Order",
  5. "name":"Order",
  6. "fields":[
  7. {"name":"order_id","type": "long"},
  8. {"name":"customer_id","type": "long"},
  9. {"name":"total","type": "float"},
  10. {"name":"order_details","type":{
  11. "type":"array",
  12. "items": {
  13. "namespace":"com.treselle.db.model",
  14. "name":"OrderDetail",
  15. "type":"record",
  16. "fields": [
  17. {"name":"quantity","type": "int"},
  18. {"name":"total","type": "float"},
  19. {"name":"product_detail","type":{
  20. "namespace":"com.treselle.db.model",
  21. "type":"record",
  22. "name":"Product",
  23. "fields":[
  24. {"name":"product_id","type": "long"},
  25. {"name":"product_name","type": "string","doc":"This is the name of the product"},
  26. {"name":"product_description","type": ["string", "null"], "default": ""},
  27. {"name":"product_status","type": {"name":"product_status", "type": "enum", "symbols": ["AVAILABLE", "OUT_OF_STOCK", "ONLY_FEW_LEFT"]}, "default":"AVAILABLE"},
  28. {"name":"product_category","type":{"type": "array", "items": "string"}, "doc": "This contains array of categories"},
  29. {"name":"price","type": "float"},
  30. {"name": "product_hash", "type": {"type": "fixed", "name": "product_hash", "size": 5}}
  31. ]
  32. }
  33. }
  34. ]
  35. }
  36. }
  37. }
  38. ]
  39. }

我的Hive

  1. CREATE EXTERNAL TABLE orders (
  2. order_id bigint,
  3. customer_id bigint,
  4. total float,
  5. order_items array<
  6. struct<
  7. quantity:int,
  8. total:float,
  9. product_detail:struct<
  10. product_id:bigint,
  11. product_name:string,
  12. product_description:string,
  13. product_status:string,
  14. product_caretogy:array<string>,
  15. price:float,
  16. product_hash:binary
  17. >
  18. >
  19. >
  20. )
  21. STORED AS AVRO
  22. LOCATION '/user/hive/test/orders';

查询

  1. SELECT order_id, customer_id FROM orders;

这可以正常工作,并按预期返回两行的结果。
但当我尝试使用横向视图爆炸时,我遇到了问题。

  1. SELECT
  2. order_id,
  3. customer_id,
  4. ord_dets.quantity as line_qty,
  5. ord_dets.total as line_total
  6. FROM
  7. orders
  8. LATERAL VIEW explode(order_items) exploded_table as ord_dets;

此查询运行正常,但不会产生任何结果。
这里有什么问题吗?

pod7payv

pod7payv1#

原因是您在模式中定义了 order_items 但是在数据和avro模式中,这个字段被称为 order_details . Hive寻找 order_items 并认为它是一个不存在的字段,默认为null。

bq3bfh9z

bq3bfh9z2#

谢谢你的指点。
当我更正错误时,我在查询时出错了。。。确定失败,出现异常java.io.ioexception:org.apache.avro.avrotypeexception:找到com.treselle.db.model.order\u详细信息,应为union
在进一步分析之后,我发现avro文件中的enum类型和fixed类型都导致了“expecting union”错误。删除这些列之后,我能够成功地查询配置单元表。

相关问题