配置单元存储桶上的最大函数

ffx8fchx  于 2021-06-26  发布在  Hive
关注(0)|答案(2)|浏览(661)

我有一个表结构在Hive如下-

create table if not exists cdp_compl_status
(
EmpNo INT,
RoleCapability STRING,
EmpPUCode STRING,
SBUCode STRING,
CertificationCode STRING,
CertificationTitle STRING,
Competency STRING,
Certification_Type STRING,
Certification_Group STRING,
Contact_Based_Program_Y_N STRING,
ExamDate DATE,
Onsite_Offshore STRING,
AttendedStatus STRING,
Marks INT,
Result STRING,
Status STRING,
txtPlanCategory STRING,
SkillID1 INT,
Complexity STRING
)
CLUSTERED BY (Marks) INTO 5 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
TBLPROPERTIES('created on' = '12 Aug');

现在,我想从表中的每个bucket中查询max(marks)。如果我这样做了-

SELECT MAX(MARKS) from cdp_compl_status;

它显示了整个表的最大分数。有什么办法,我能查出来吗 MAX(MARKS) 从每个桶里?

jk9hmnmh

jk9hmnmh1#

你把table分成5个桶。。。根据%函数将数据拆分为存储桶,例如: marks%5==0 进入第一个铲斗 marks%5==1 进入第二个铲斗 marks%5==2 进入第三个铲斗 marks%5==3 进入第四个铲斗 marks%5==4 进入第5个铲斗
所以你需要写5个这样的查询: Select max(marks) from cdp_compl_status where marks%5=0; --第一桶最大值
我想这应该可以。

fquxozlt

fquxozlt2#

使用表示例:

select max(marks),min(marks),avg(marks) from cert_comp_status_buck
tablesample(bucket 1 out of 5 on marks);

select max(marks),min(marks),avg(marks) from cert_comp_status_buck
tablesample(bucket 2 out of 5 on marks);

select max(marks),min(marks),avg(marks) from cert_comp_status_buck
tablesample(bucket 3 out of 5 on marks);

select max(marks),min(marks),avg(marks) from cert_comp_status_buck
tablesample(bucket 4 out of 5 on marks);

select max(marks),min(marks),avg(marks) from cert_comp_status_buck
tablesample(bucket 5 out of 5 on marks);

相关问题