reduce步骤似乎停留在99%

我正在尝试运行一个连接到其他5个表的配置单元查询。其中一个表非常大（有150亿条记录），但由于join子句的原因，我实际上只从该表中寻找800万条记录。
我一直在cloudera的日志控制台中看到这一点。。。

INFO  : 2015-09-10 09:51:43,209 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 437512.26 sec

我读过关于table歪斜，然后一个减速机是一个瓶颈，但我不知道如何检查一个表歪斜，如果是这样的情况。这就是问题所在吗？
编辑：
这是查询中的解释计划，表c是大表。。。。

STAGE DEPENDENCIES:
  Stage-8 is a root stage , consists of Stage-1
  Stage-1
  Stage-9 depends on stages: Stage-1
  Stage-6 depends on stages: Stage-9
  Stage-0 depends on stages: Stage-6
STAGE PLANS:
  Stage: Stage-8
    Conditional Operator
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: a
            Statistics: Num rows: 8939332 Data size: 53635992 Basic stats: COMPLETE Column stats: NONE
            Reduce Output Operator
              key expressions: event_type_id (type: int)
              sort order: +
              Map-reduce partition columns: event_type_id (type: int)
              Statistics: Num rows: 8939332 Data size: 53635992 Basic stats: COMPLETE Column stats: NONE
              value expressions: marketing_contact_id (type: int), ent_cust_id (type: string), campaign_master_id (type: string), event_id (type: string), event_timestamp (type: string)
          TableScan
            alias: b
            Statistics: Num rows: 6 Data size: 2750 Basic stats: COMPLETE Column stats: NONE
            Reduce Output Operator
              key expressions: event_type_id (type: int)
              sort order: +
              Map-reduce partition columns: event_type_id (type: int)
              Statistics: Num rows: 6 Data size: 2750 Basic stats: COMPLETE Column stats: NONE
              value expressions: event_type (type: string), event_type_category (type: string), event_type_subcategory (type: string), description (type: string)
          TableScan
            alias: c
            filterExpr: (loaddate > 20150401) (type: boolean)
            Statistics: Num rows: 4415906479 Data size: 185130335983 Basic stats: COMPLETE Column stats: NONE
            Reduce Output Operator
              key expressions: event_type_id (type: int)
              sort order: +
              Map-reduce partition columns: event_type_id (type: int)
              Statistics: Num rows: 4415906479 Data size: 185130335983 Basic stats: COMPLETE Column stats: NONE
              value expressions: display_event_id (type: string), category (type: string), time (type: string), user_id (type: string), ip (type: string), buy_id (type: double), ad_id (type: string), creative_id (type: string), creative_version (type: double), creative_size_id (type: string), site_id (type: string), page_id (type: string), keyword (type: string), country_id (type: double), state_province (type: string), browser_id (type: double), browser_version (type: double), os_id (type: double), local_user_id (type: string), sv1 (type: string), browser_type (type: string), country (type: string), os_type (type: string), state_province_name (type: string)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Left Outer Join0 to 1
               Inner Join 0 to 2
          keys:
            0 event_type_id (type: int)
            1 event_type_id (type: int)
            2 event_type_id (type: int)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col11, _col12, _col13, _col14, _col22, _col23, _col24, _col25, _col26, _col27, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col41, _col42, _col43, _col48, _col58, _col59, _col60, _col61, _col62
          Statistics: Num rows: 9714994464 Data size: 407286747990 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
  Stage: Stage-9
    Map Reduce Local Work
      Alias -> Map Local Tables:
        e 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        e 
          TableScan
            alias: e
            Statistics: Num rows: 7080 Data size: 3695175 Basic stats: PARTIAL Column stats: NONE
            HashTable Sink Operator
              keys:
                0 _col30 (type: string)
                1 ad_id (type: string)
  Stage: Stage-6
    Map Reduce
      Map Operator Tree:
          TableScan
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 _col30 (type: string)
                1 ad_id (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col11, _col12, _col13, _col14, _col22, _col23, _col24, _col25, _col26, _col27, _col29, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col41, _col42, _col43, _col48, _col58, _col59, _col60, _col61, _col62, _col67, _col68, _col69, _col70, _col72, _col73, _col74
              Statistics: Num rows: 10686494142 Data size: 448015432499 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: int), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col41 (type: double), _col59 (type: string), _col42 (type: double), _col29 (type: double), _col23 (type: string), _col60 (type: string), _col37 (type: double), _col31 (type: string), _col33 (type: string), _col32 (type: double), _col22 (type: string), _col24 (type: int), _col27 (type: string), _col36 (type: string), _col48 (type: string), _col43 (type: double), _col61 (type: string), _col35 (type: string), _col34 (type: string), _col38 (type: string), _col62 (type: string), _col58 (type: string), _col25 (type: string), _col26 (type: string), _col67 (type: int), _col68 (type: int), _col69 (type: string), _col70 (type: string), _col72 (type: string), _col73 (type: string), _col74 (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col39, _col40
                Statistics: Num rows: 10686494142 Data size: 448015432499 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 10686494142 Data size: 448015432499 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

reduce步骤似乎停留在99%

1条答案

相关问题

热门标签

最新问答