hive:分区表中的Map联接

u91tlkcl  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(580)

考虑到hive中具有事实表和维度表的典型数据仓库场景,假设事实表通过分区拆分到多个数据节点上。在将事实表(已分区)与维度(未分区)连接时,使用Map连接似乎是合乎逻辑的,因为维度表的大小很小&它们要存储在内存中,以便在所有节点上高效地与事实数据连接。
但是,很少有联机资源建议对分区表执行Map联接,两个表上的分区键应该与联接键相同。
所以,这就是我要寻找答案的问题:
分区表(事实)能否与非分区表(维度)Map联接?

i7uq4tfw

i7uq4tfw1#

答案是-是的
Map联接运算符
演示

  1. create table fact (rec_id int,dim_id int) partitioned by (dt date);
  2. create table dim (dim_id int,descr string);
  1. explain
  2. select *
  3. from fact f join dim d
  4. on d.dim_id = f.dim_id
  1. STAGE DEPENDENCIES:
  2. Stage-4 is a root stage
  3. Stage-3 depends on stages: Stage-4
  4. Stage-0 depends on stages: Stage-3
  5. STAGE PLANS:
  6. Stage: Stage-4
  7. Map Reduce Local Work
  8. Alias -> Map Local Tables:
  9. d
  10. Fetch Operator
  11. limit: -1
  12. Alias -> Map Local Operator Tree:
  13. d
  14. TableScan
  15. alias: d
  16. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  17. Filter Operator
  18. predicate: dim_id is not null (type: boolean)
  19. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  20. HashTable Sink Operator
  21. keys:
  22. 0 dim_id (type: int)
  23. 1 dim_id (type: int)
  24. Stage: Stage-3
  25. Map Reduce
  26. Map Operator Tree:
  27. TableScan
  28. alias: f
  29. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  30. Filter Operator
  31. predicate: dim_id is not null (type: boolean)
  32. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  33. Map Join Operator
  34. condition map:
  35. Inner Join 0 to 1
  36. keys:
  37. 0 dim_id (type: int)
  38. 1 dim_id (type: int)
  39. outputColumnNames: _col0, _col1, _col2, _col6, _col7
  40. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  41. Select Operator
  42. expressions: _col0 (type: int), _col1 (type: int), _col2 (type: date), _col6 (type: int), _col7 (type: string)
  43. outputColumnNames: _col0, _col1, _col2, _col3, _col4
  44. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  45. File Output Operator
  46. compressed: false
  47. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  48. table:
  49. input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  50. output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  51. serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  52. Local Work:
  53. Map Reduce Local Work
  54. Stage: Stage-0
  55. Fetch Operator
  56. limit: -1
  57. Processor Tree:
  58. ListSink
展开查看全部

相关问题