reduce步骤似乎停留在99%

mfuanj7w  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(608)

我正在尝试运行一个连接到其他5个表的配置单元查询。其中一个表非常大(有150亿条记录),但由于join子句的原因,我实际上只从该表中寻找800万条记录。
我一直在cloudera的日志控制台中看到这一点。。。

  1. INFO : 2015-09-10 09:51:43,209 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 437512.26 sec

我读过关于table歪斜,然后一个减速机是一个瓶颈,但我不知道如何检查一个表歪斜,如果是这样的情况。这就是问题所在吗?
编辑:
这是查询中的解释计划,表c是大表。。。。

  1. STAGE DEPENDENCIES:
  2. Stage-8 is a root stage , consists of Stage-1
  3. Stage-1
  4. Stage-9 depends on stages: Stage-1
  5. Stage-6 depends on stages: Stage-9
  6. Stage-0 depends on stages: Stage-6
  7. STAGE PLANS:
  8. Stage: Stage-8
  9. Conditional Operator
  10. Stage: Stage-1
  11. Map Reduce
  12. Map Operator Tree:
  13. TableScan
  14. alias: a
  15. Statistics: Num rows: 8939332 Data size: 53635992 Basic stats: COMPLETE Column stats: NONE
  16. Reduce Output Operator
  17. key expressions: event_type_id (type: int)
  18. sort order: +
  19. Map-reduce partition columns: event_type_id (type: int)
  20. Statistics: Num rows: 8939332 Data size: 53635992 Basic stats: COMPLETE Column stats: NONE
  21. value expressions: marketing_contact_id (type: int), ent_cust_id (type: string), campaign_master_id (type: string), event_id (type: string), event_timestamp (type: string)
  22. TableScan
  23. alias: b
  24. Statistics: Num rows: 6 Data size: 2750 Basic stats: COMPLETE Column stats: NONE
  25. Reduce Output Operator
  26. key expressions: event_type_id (type: int)
  27. sort order: +
  28. Map-reduce partition columns: event_type_id (type: int)
  29. Statistics: Num rows: 6 Data size: 2750 Basic stats: COMPLETE Column stats: NONE
  30. value expressions: event_type (type: string), event_type_category (type: string), event_type_subcategory (type: string), description (type: string)
  31. TableScan
  32. alias: c
  33. filterExpr: (loaddate > 20150401) (type: boolean)
  34. Statistics: Num rows: 4415906479 Data size: 185130335983 Basic stats: COMPLETE Column stats: NONE
  35. Reduce Output Operator
  36. key expressions: event_type_id (type: int)
  37. sort order: +
  38. Map-reduce partition columns: event_type_id (type: int)
  39. Statistics: Num rows: 4415906479 Data size: 185130335983 Basic stats: COMPLETE Column stats: NONE
  40. value expressions: display_event_id (type: string), category (type: string), time (type: string), user_id (type: string), ip (type: string), buy_id (type: double), ad_id (type: string), creative_id (type: string), creative_version (type: double), creative_size_id (type: string), site_id (type: string), page_id (type: string), keyword (type: string), country_id (type: double), state_province (type: string), browser_id (type: double), browser_version (type: double), os_id (type: double), local_user_id (type: string), sv1 (type: string), browser_type (type: string), country (type: string), os_type (type: string), state_province_name (type: string)
  41. Reduce Operator Tree:
  42. Join Operator
  43. condition map:
  44. Left Outer Join0 to 1
  45. Inner Join 0 to 2
  46. keys:
  47. 0 event_type_id (type: int)
  48. 1 event_type_id (type: int)
  49. 2 event_type_id (type: int)
  50. outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col11, _col12, _col13, _col14, _col22, _col23, _col24, _col25, _col26, _col27, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col41, _col42, _col43, _col48, _col58, _col59, _col60, _col61, _col62
  51. Statistics: Num rows: 9714994464 Data size: 407286747990 Basic stats: COMPLETE Column stats: NONE
  52. File Output Operator
  53. compressed: false
  54. table:
  55. input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  56. output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  57. serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
  58. Stage: Stage-9
  59. Map Reduce Local Work
  60. Alias -> Map Local Tables:
  61. e
  62. Fetch Operator
  63. limit: -1
  64. Alias -> Map Local Operator Tree:
  65. e
  66. TableScan
  67. alias: e
  68. Statistics: Num rows: 7080 Data size: 3695175 Basic stats: PARTIAL Column stats: NONE
  69. HashTable Sink Operator
  70. keys:
  71. 0 _col30 (type: string)
  72. 1 ad_id (type: string)
  73. Stage: Stage-6
  74. Map Reduce
  75. Map Operator Tree:
  76. TableScan
  77. Map Join Operator
  78. condition map:
  79. Left Outer Join0 to 1
  80. keys:
  81. 0 _col30 (type: string)
  82. 1 ad_id (type: string)
  83. outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col11, _col12, _col13, _col14, _col22, _col23, _col24, _col25, _col26, _col27, _col29, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col41, _col42, _col43, _col48, _col58, _col59, _col60, _col61, _col62, _col67, _col68, _col69, _col70, _col72, _col73, _col74
  84. Statistics: Num rows: 10686494142 Data size: 448015432499 Basic stats: COMPLETE Column stats: NONE
  85. Select Operator
  86. expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col41 (type: double), _col59 (type: string), _col42 (type: double), _col29 (type: double), _col23 (type: string), _col60 (type: string), _col37 (type: double), _col31 (type: string), _col33 (type: string), _col32 (type: double), _col22 (type: string), _col24 (type: int), _col27 (type: string), _col36 (type: string), _col48 (type: string), _col43 (type: double), _col61 (type: string), _col35 (type: string), _col34 (type: string), _col38 (type: string), _col62 (type: string), _col58 (type: string), _col25 (type: string), _col26 (type: string), _col67 (type: int), _col68 (type: int), _col69 (type: string), _col70 (type: string), _col72 (type: string), _col73 (type: string), _col74 (type: string)
  87. outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col39, _col40
  88. Statistics: Num rows: 10686494142 Data size: 448015432499 Basic stats: COMPLETE Column stats: NONE
  89. File Output Operator
  90. compressed: false
  91. Statistics: Num rows: 10686494142 Data size: 448015432499 Basic stats: COMPLETE Column stats: NONE
  92. table:
  93. input format: org.apache.hadoop.mapred.TextInputFormat
  94. output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  95. serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  96. Local Work:
  97. Map Reduce Local Work
  98. Stage: Stage-0
  99. Fetch Operator
  100. limit: -1
  101. Processor Tree:
  102. ListSink
y0u0uwnf

y0u0uwnf1#

如果有倾斜表,请尝试使用以下属性:hive.optimize.skewjoin=true
另外,你能粘贴的工作日志,这将使问题更清楚

相关问题