如何在hive(hadoop)中解析字符串(来自不同的表)并将其加载到不同的表中

f87krz0w  于 2021-06-04  发布在  Hadoop
关注(0)|答案(1)|浏览(591)

我将此表作为输入:

  1. Table Name:Deals
  2. Columns: Doc_id(BIGINT),Nv_Pairs_Feed(STRING),Nv_Pairs_Category(STRING)
  3. For Example:
  4. Doc_id: 4997143658422483637
  5. Nv_Pairs_Feed: "TYPE:Wiper Blade;CONDITION:New;CATEGORY:Auto Parts and Accessories;STOCK_AVAILABILITY:Y;ORIGINAL_PRICE:0.00"
  6. Nv_Pairs_Category: "Condition:New;Store:PartsGeek.com;"

我正在尝试解析字段:“nv\u pairs\u feed”和“nv\u pairs\u category”,并提取它们的n:v对(每对都除以“;”,并且每个名称和值都用“:”除。我的目标是将每个n:v作为一行插入此表中:

  1. Doc_id | Name | Value | Source_Field

期望结果示例:

  1. 4997143658422483637 | Condition | New | Nv_Pairs_Category
  2. 4997143658422483637 | Store | PartsGeek.com | Nv_Pairs_Category
  3. 4997143658422483637 | TYPE | Wiper Blade | Nv_Pairs_Feed
  4. 4997143658422483637 | CONDITION | New | Nv_Pairs_Feed
  5. 4997143658422483637 | CATEGORY | Auto Parts and Accessories | Nv_Pairs_Feed
  6. 4997143658422483637 | STOCK_AVAILABILITY | Y | Nv_Pairs_Feed
  7. 4997143658422483637 | ORIGINAL_PRICE | 0.00 | Nv_Pairs_Feed
zsbz8rwp

zsbz8rwp1#

可以使用标准配置单元udf将字符串转换为Map str_to_map 然后使用brickhouse udf(http://github.com/klout/brickhouse ) map_key_values , combine 以及 numeric_range 去炸那些Map。i、 大概是这样的

  1. create view deals_map_view as
  2. select doc_id,
  3. map_key_values(
  4. combine( map_to_str( nv_pairs_feed, ';', ':'),
  5. map_to_str( mv_pairs_category, ';', ':'))) as deals_map_key_values
  6. from deals;
  7. select
  8. doc_id,
  9. array_index( deals_map_key_values, i ).key as name,
  10. array_index( deals_map_key_values, i ).value as value
  11. from deals_map_view
  12. lateral view numeric_range( size( feed_map_key_values) ) i1 as i

你可以用一个 explode_map 自定义项

相关问题