Hiveversion:1.1.0-cdh5.15.2,我最近开始学习hive源代码及其工作原理。下面是我遇到的问题
explain insert into testv1 select * from test_textfile where val >200;
上面是一个简单的查询,下面是执行计划
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5
Stage-4
Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
Stage-2 depends on stages: Stage-0
Stage-3
Stage-5
Stage-6 depends on stages: Stage-5
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: test_textfile
Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (val > 200) (type: boolean)
Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: UDFToString(val) (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: true
Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
name: test.testv1
Stage: Stage-7
Conditional Operator
Stage: Stage-4
Move Operator
files:
hdfs directory: true
destination: hdfs://xlclusterns1/tmp/hive-stagingdir/staging_hive_2021-04-14_15-14-30_205_4974356220876798617-1/-ext-10000
Stage: Stage-0
Move Operator
tables:
replace: false
table:
input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
name: test.testv1
Stage: Stage-2
Stats-Aggr Operator
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
File Output Operator
compressed: true
table:
input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
name: test.testv1
Stage: Stage-5
Map Reduce
Map Operator Tree:
TableScan
File Output Operator
compressed: true
table:
input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
name: test.testv1
Stage: Stage-6
Move Operator
files:
hdfs directory: true
destination: hdfs://xlclusterns1/tmp/hive-stagingdir/staging_hive_2021-04-14_15-14-30_205_4974356220876798617-1/-ext-10000
问题是我无法解释为什么第三阶段和第五阶段做同样的事情,有人知道这个问题吗?
暂无答案!
目前还没有任何答案,快来回答吧!