在配置单元查询执行计划中执行相同操作的不同阶段

Hiveversion:1.1.0-cdh5.15.2，我最近开始学习hive源代码及其工作原理。下面是我遇到的问题

explain insert into testv1 select * from test_textfile  where val >200;

上面是一个简单的查询，下面是执行计划

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5
  Stage-4
  Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
  Stage-2 depends on stages: Stage-0
  Stage-3
  Stage-5
  Stage-6 depends on stages: Stage-5
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test_textfile
            Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (val > 200) (type: boolean)
              Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: UDFToString(val) (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: true
                  Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
                      output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
                      serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
                      name: test.testv1
  Stage: Stage-7
    Conditional Operator
  Stage: Stage-4
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://xlclusterns1/tmp/hive-stagingdir/staging_hive_2021-04-14_15-14-30_205_4974356220876798617-1/-ext-10000
  Stage: Stage-0
    Move Operator
      tables:
          replace: false
          table:
              input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
              output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
              serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
              name: test.testv1
  Stage: Stage-2
    Stats-Aggr Operator
  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: true
              table:
                  input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
                  output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
                  serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
                  name: test.testv1
  Stage: Stage-5
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: true
              table:
                  input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
                  output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
                  serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
                  name: test.testv1
  Stage: Stage-6
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://xlclusterns1/tmp/hive-stagingdir/staging_hive_2021-04-14_15-14-30_205_4974356220876798617-1/-ext-10000

问题是我无法解释为什么第三阶段和第五阶段做同样的事情，有人知道这个问题吗？

在配置单元查询执行计划中执行相同操作的不同阶段

暂无答案！

相关问题

热门标签

最新问答