postgresql 检查哪些列在分组后具有不同的值

myzjeezk  于 2023-06-22  发布在  PostgreSQL
关注(0)|答案(1)|浏览(125)

我有一个这样的数据集:

host_id binary_id service binary_date region binary_path ......
1       2         hello   05-06-2012  US     /bin/usr
1       2         hello   08-09-2022  US     /bin/usr
2       3         another 08-09-2026  UK     /bin/java

host_id,binary_id,service是我感兴趣的组,binary_date region binary_path是这个表中很多列的子集,这个表大约有50列。我想检查50列中的哪些列在按host_id,binary_id,service分组后具有不同的值。例如,在这个场景中,我将获得binary_date列,因为这是唯一一个分组后具有不同值的列。
预期结果是:

host_id binary_id service binary_date region binary_path ......
1       2         hello   05-06-2012  null   null
1       2         hello   08-09-2022  null   null

您可以看到,在本例中,只有binary_date包含不同的值,其他列region,binary_path为空,因为组中的行的值完全相同。

um6iljoc

um6iljoc1#

这是一种方法:
row number()返回分区内的唯一id;这个唯一的数字用于确定在同一分区内一行是否具有不同的值,如果每个分区的计数等于行号的总数,则发生了变化,否则返回null。

with cte as (
   select *, count(1) over (partition by host_id, binary_id, service) as count_,
             row_number() over (partition by host_id, binary_id, service, binary_date) as rn_binary_date,
             row_number() over (partition by host_id, binary_id, service, region) as rn_region,
             row_number() over (partition by host_id, binary_id, service, binary_path) as rn_binary_path
   from mytable
)
select host_id, binary_id, service, 
      case 
        when count_ = sum(rn_binary_date) over(partition by host_id, binary_id, service) 
        then binary_date 
      end as binary_date,
      case 
        when count_ = sum(rn_region) over(partition by host_id, binary_id, service)
        then region 
      end as region,
      case 
        when count_ = sum(rn_binary_path) over(partition by host_id, binary_id, service)
        then binary_path 
      end as binary_path
from cte
where count_ > 1

结果:

host_id binary_id   service     binary_date region  binary_path
1       2           hello       2012-06-05  null    null
1       2           hello       2022-09-08  null    null

Demo here

相关问题