组合配置单元中表的数据

wlsrxk51  于 2021-06-27  发布在  Hive
关注(0)|答案(3)|浏览(351)

需要将配置单元表中的数据合并到一行中。其目的是捕获数据/值,而不是 'N' i、 e.除了 'N' 应该为所有的 'col1' 价值观
表1:

col1 col2 col3 col4 col5 col6
-----------------------------
GHY   BG  Q    N    N    N
GHY   BG  N    T    N    N
GHY   BG  N    N    A    N
GHY   BG  N    N    N    Z

尝试以下查询:

Select col1, col2,array(
max(CASE WHEN col3 == 'Q' THEN 'Q' ELSE 'None' END),
max(CASE WHEN col4 == 'T' THEN 'T' ELSE 'None' END),
max(CASE WHEN col5 == 'A' THEN 'A' ELSE 'None' END),
max(CASE WHEN col6 == 'Z' THEN 'Z' ELSE 'None' END))
FROM table1 GROUP BY col1,col2;

得到以下信息:
实际o/p:

GHY BG ['None','None','A','None']

预期o/p:

GHY BG ['Q','T','A','Z']

没有得到错误点:(
更新\u 1:
从查询中删除“max”后:

FAILED: SemanticException [Error 10025]: Line 2:11 Expression not in GROUP BY key 'Q'

更新\u 2:

select col1,col2,collect_set(col)
from (select col1,col2,t.col
      from tbl 
      lateral view explode(array(col3,col4,col5,col6)) t as col
      where t.col <> 'N'
     ) t

错误:

FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY key 'col1'
toiithl6

toiithl61#

使用 explode 为col1、col2的组合获取每列一行,并使用 collect_set .

select col1,col2,collect_set(col)
from (select col1,col2,t.col
      from tbl 
      lateral view explode(array(col3,col4,col5,col6)) t as col
      where t.col <> 'N'
     ) t
group by col1,col2
uqzxnwby

uqzxnwby2#

此查询生成预期结果:

with Table1 as --your test data
(
 select stack(4,
   'GHY','BG','Q','N','N','N',
   'GHY','BG','N','T','N','N',
   'GHY','BG','N','N','A','N',
   'GHY','BG','N','N','N','Z') as (col1, col2, col3, col4, col5, col6)
)

select col1, col2,array(
       nvl(max(CASE WHEN col3 = 'Q' THEN 'Q' END),'None'),
       nvl(max(CASE WHEN col4 = 'T' THEN 'T' END),'None'),
       nvl(max(CASE WHEN col5 = 'A' THEN 'A' END),'None'), 
       nvl(max(CASE WHEN col6 = 'Z' THEN 'Z' END),'None'))
from Table1
group by col1, col2;

结果:

GHY BG  ["Q","T","A","Z"]
9njqaruj

9njqaruj3#

另一种可能的解决方案(受提供的方案启发)是:

Select col1,col2,array(concat(max(col3),max(col4),max(col5),max(col6)))
group by col1,col2;

注: max() 将选取最大值。因此,您可能需要将不需要的值更改为 'aa' . 否则,可能会选取其他值。
例1:

col1 col2 col3 col4 col5 col6
-----------------------------
GHY   BG  Q    N    N    N
GHY   BG  N    T    N    N
GHY   BG  N    N    A    N
GHY   BG  N    N    N    Z

结果:

['Q','T','N','Z']

例2:

col1 col2 col3 col4 col5 col6
-----------------------------
GHY   BG  Q    a    a    a
GHY   BG  a    T    a    a
GHY   BG  a    a    A    a
GHY   BG  a    a    a    Z

结果:

['Q','T','A','Z']

相关问题