配置单元存储桶上的最大函数

ffx8fchx 于 2021-06-26 发布在 Hive

关注(0)|答案(2)|浏览(661)

我有一个表结构在Hive如下-

create table if not exists cdp_compl_status
(
EmpNo INT,
RoleCapability STRING,
EmpPUCode STRING,
SBUCode STRING,
CertificationCode STRING,
CertificationTitle STRING,
Competency STRING,
Certification_Type STRING,
Certification_Group STRING,
Contact_Based_Program_Y_N STRING,
ExamDate DATE,
Onsite_Offshore STRING,
AttendedStatus STRING,
Marks INT,
Result STRING,
Status STRING,
txtPlanCategory STRING,
SkillID1 INT,
Complexity STRING
)
CLUSTERED BY (Marks) INTO 5 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
TBLPROPERTIES('created on' = '12 Aug');

现在，我想从表中的每个bucket中查询max（marks）。如果我这样做了-

SELECT MAX(MARKS) from cdp_compl_status;

它显示了整个表的最大分数。有什么办法，我能查出来吗 MAX(MARKS) 从每个桶里？

Hive hiveql

来源：https://stackoverflow.com/questions/52018855/max-function-on-hive-bucket

2条答案

按热度按时间

jk9hmnmh1#

你把table分成5个桶。。。根据%函数将数据拆分为存储桶，例如： marks%5==0 进入第一个铲斗 marks%5==1 进入第二个铲斗 marks%5==2 进入第三个铲斗 marks%5==3 进入第四个铲斗 marks%5==4 进入第5个铲斗
所以你需要写5个这样的查询： Select max(marks) from cdp_compl_status where marks%5=0; --第一桶最大值
我想这应该可以。

赞(0）回复(0）举报 2021-06-26

fquxozlt2#

使用表示例：

select max(marks),min(marks),avg(marks) from cert_comp_status_buck
tablesample(bucket 1 out of 5 on marks);

select max(marks),min(marks),avg(marks) from cert_comp_status_buck
tablesample(bucket 2 out of 5 on marks);

select max(marks),min(marks),avg(marks) from cert_comp_status_buck
tablesample(bucket 3 out of 5 on marks);

select max(marks),min(marks),avg(marks) from cert_comp_status_buck
tablesample(bucket 4 out of 5 on marks);

select max(marks),min(marks),avg(marks) from cert_comp_status_buck
tablesample(bucket 5 out of 5 on marks);

赞(0）回复(0）举报 2021-06-26

我来回答

配置单元存储桶上的最大函数

2条答案

相关问题

热门标签

最新问答