postgresql 在每个ID的2个值之间选择最频繁的值(带条件)

m4pnthwp  于 2023-05-06  发布在  PostgreSQL
关注(0)|答案(1)|浏览(153)

还有比我用cte做的更好的代码吗?
我想选择每个id的M和F之间最频繁的值,规则如下:

  • 性别值不同于“F”或“M”,在频率计算中未考虑
  • 如果最频繁计算不成功=“不确定”
  • 如果最频繁计算成功=性别值

下面是一个例子:
资料集
| 身份证|性别|
| --------------|--------------|
| 1|中|
| 1|F|
| 二|中|
| 二|中|
| 二|x|
| 二|F|
| 三|F|
| 三|0|
| 四个|f|
| 四个||
| 五|d的|
预期结果
| 身份证|性别|
| --------------|--------------|
| 1|不确定的|
| 二|中|
| 三|F|
| 四个|F|
我做了什么

WITH cte1 AS (
                SELECT 
             id,
            sex,
            RANK() OVER (PARTITION BY id ORDER BY count(*)) rn
        FROM dataset
        WHERE sex ~* '(F|M)' AND sex IS NOT NULL
        GROUP BY id, sex
        ),
    cte2 AS (
         SELECT id,
                       max(rn) AS max
        FROM cte1
        GROUP BY id
        ),
    cte3 AS (
        SELECT cte2.id,
                       sex
        FROM cte2
        LEFT JOIN cte1 ON cte2.id=cte1.id AND max=rn
        WHERE cte1.id IS NOT NULL 
        ),
    cte4 AS (   
        SELECT id,
                      count(*) as cnt
        FROM cte3
        GROUP BY id
        )
SELECT DISTINCT cte4.id,
               CASE 
               WHEN cnt>1 THEN 'inconclusive'
               WHEN cnt=1 AND SEX IN ('F', 'M') THEN sex
               END AS sex
FROM cte4
LEFT JOIN cte3 ON cte4.id=cte3.id

对我来说,代码在某种意义上是有效的,它给出了适当的结果,但它看起来有点笨重,我正在寻找改进。有吗?
注意:我使用了DISTINCT ON (),但它不能检索id 1 = inconclusive(F或M取决于顺序)

xqkwcwgp

xqkwcwgp1#

你似乎把事情复杂化了。
首先,我会过滤掉M或F以外的值,然后按id聚合,并计算每个值出现的次数:

select id, 
    count(*) filter(where sex = 'M') cnt_m,
    count(*) filter(where sex = 'F') cnt_f
from dataset
where sex in ('M', 'F')
group by id

我不认为where子句中需要正则表达式匹配,因为似乎您只想保留'F''M'值。
从那时起,我们要做的就是比较计数。我们可以在外部查询中执行,这样我们就不需要重复条件表达式:

select id, 
    case when cnt_m > cnt_f then 'M'
         when cnt_m < cnt_f then 'F'
         else 'inconclusive'
    end as res
from (
    select id, 
        count(*) filter(where sex = 'M') cnt_m,
        count(*) filter(where sex = 'F') cnt_f
    from dataset
    where sex in ('M', 'F')
    group by id
) t

相关问题