sql—如何根据优先级实现不同的记录

mqkwyuun  于 2021-06-26  发布在  Hive
关注(0)|答案(3)|浏览(625)

我有一张表,上面有如下数据,

id  code    data1   data2   country
1   1      A           NULL      IND
1   1      B            B        NZ
1   1                            CA
1   1      C           Z         WI
1   1      D           S         UK
2   2      NULL        NULL       IND
2   2       S          NULL       NZ
2   2       NULL        K         CA
2   2       T           T          WI
2   2       R           K          UK
3   3       NULL        A          WI
3   3       NULL        a          UK

将根据“国家/地区优先级”字段填充记录。优先考虑ind、nz、ca、wi、uk
如果data1中有任何空值,data2字段数据将从下一个优先级记录填充。
所以,我的预期结果是:目标表:

id  code    data1   data2   country
1   1          A       B    IND
2   2          S       K    IND
3   3          NULL    A    WI

有谁能帮我查询到上面的结果集吗。
为了更好地理解查询,我又添加了几行。

ki1q1bka

ki1q1bka1#

Hive有 first_value() 函数,可用于此目的:

select distinct id, code,
       first_value(data1) over (partition by id, code
                                order by (case when data1 is not null then 1 else 2 end),
                                         (case country when 'IND' then 1 when 'NZ' then 2 when 'CA' then 3 when 'WI' then 4 when 'UK' then 5 else 6 end)
                               ) as data1,
       first_value(data2) over (partition by id, code
                                order by (case when data2 is not null then 1 else 2 end),
                                         (case country when 'IND' then 1 when 'NZ' then 2 when 'CA' then 3 when 'WI' then 4 when 'UK' then 5 else 6 end)
                               ) as data2,
       first_value(country) over (partition by id, code
                                order by (case when data1 is not null then 1 else 2 end),
                                         (case country when 'IND' then 1 when 'NZ' then 2 when 'CA' then 3 when 'WI' then 4 when 'UK' then 5 else 6 end)
                               ) as country
from t;

我不太喜欢 select distinct 具有窗口功能。在这种情况下,这似乎是最简单的解决方案。

eqfvzcg8

eqfvzcg82#

另一种方法是基于最小结构。
对于我使用函数的顺序 field ( field(country,'IND','NZ','CA','WI','UK') ).
因为它不见了,所以我把它添加到了文档中。https://cwiki.apache.org/confluence/display/hive/languagemanual+udf

select      id
           ,min (code)                                                                                                           as code
           ,min (case when coalesce(trim(data1),'') <> '' then struct(field(country,'IND','NZ','CA','WI','UK'),data1) end).col2  as data1
           ,min (case when coalesce(trim(data2),'') <> '' then struct(field(country,'IND','NZ','CA','WI','UK'),data2) end).col2  as data2
           ,min (struct(field(country,'IND','NZ','CA','WI','UK'),country)).col2                                                  as country

from        mytable

group by    id

order by    id
;

演示

create table mytable 
(
    id      int
   ,code    int
   ,data1   string
   ,data2   string
   ,country string
);

insert into mytable values

    (1 ,1 ,'A'  ,NULL ,'IND')
   ,(1 ,1 ,'B'  ,'B'  ,'NZ' )
   ,(1 ,1 ,''   ,''   ,'CA' )
   ,(1 ,1 ,'C'  ,'Z'  ,'WI' )
   ,(1 ,1 ,'D'  ,'S'  ,'UK' )
   ,(2 ,2 ,NULL ,NULL ,'IND')
   ,(2 ,2 ,'S'  ,NULL ,'NZ' )
   ,(2 ,2 ,NULL ,'K'  ,'CA' )
   ,(2 ,2 ,'T'  ,'T'  ,'WI' )
   ,(2 ,2 ,'R'  ,'K'  ,'UK' )
   ,(3 ,3 ,NULL ,'A'  ,'WI' )
   ,(3 ,3 ,NULL ,'a'  ,'UK' )
;
select      id
           ,min (code)                                                                                                           as code
           ,min (case when coalesce(trim(data1),'') <> '' then struct(field(country,'IND','NZ','CA','WI','UK'),data1) end).col2  as data1
           ,min (case when coalesce(trim(data2),'') <> '' then struct(field(country,'IND','NZ','CA','WI','UK'),data2) end).col2  as data2
           ,min (struct(field(country,'IND','NZ','CA','WI','UK'),country)).col2                                                  as country

from        mytable

group by    id

order by    id
;
+----+------+-------+-------+---------+
| id | code | data1 | data2 | country |
+----+------+-------+-------+---------+
|  1 |    1 | A     | B     | IND     |
|  2 |    2 | S     | K     | IND     |
|  3 |    3 | NULL  | A     | WI      |
+----+------+-------+-------+---------+
w6lpcovy

w6lpcovy3#

用例获取优先级并在其上使用first\u值。

select id, max(code), max(data1), max(data2), max(country)
from (
    select
        id,
        code,
        first_value(data1) over (partition by id 
            order by case when data1 is null or data1 = '' then 1 else 0 end * 10 + priority) data1,
        first_value(data2) over (partition by id 
            order by case when data2 is null or data2 = '' then 1 else 0 end * 10 + priority) data2,
        first_value(country) over (partition by id 
            order by case when country is null or country = '' then 1 else 0 end * 10 + priority) country
    from (
        select
            t.*,
            case country
                when 'IND' then 1
                when 'NZ' then 2
                when 'CA' then 3
                when 'WI' then 4
                when 'UK' then 5
            end priority
        from your_table t
    ) t
) t group by id;

生产:

ID  MAX(CODE)   MAX(DATA1)  MAX(DATA2)  MAX(COUNTRY)
1   1           A           B           IND
2   2           S           K           IND
3   3           NULL        A           WI

编辑:

您也可以使用 FIELD 函数(在hive、mysql中提供)生成@dudu在下面的注解中建议的优先级:

field(country,'IND','NZ','CA','WI','UK')

请参见:
https://cwiki.apache.org/confluence/display/hive/languagemanual+udf

相关问题