postgresql 使用SQL检测性别变化

aelbi1ox  于 2023-01-25  发布在  PostgreSQL
关注(0)|答案(3)|浏览(239)

我有这样一个数据集:

  • 编号:例如111、111、111、112、112、113、113
  • 年份:例如2010年、2011年、2012年、2010年、2011年、2010年、2015年
  • 性别:例如M、M、F、F、F、M、M

在该数据集中,ID = 111发生性别变化(从M变为F-或从F变为M)
使用postgre sql,我试图找出:

  • 答:有多少身份保持为人(哪些身份)
  • B:有多少身份是女性(哪些身份)
  • 有多少身份证从男人到女人(哪些身份证)
  • 有多少身份从女人到男人(哪些身份)

我是这样试的:

# problem A
SELECT COUNT(DISTINCT ID) FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'M');
SELECT DISTINCT ID FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'M');

# problem B
SELECT COUNT(DISTINCT ID) FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'F');
SELECT DISTINCT ID FROM table WHERE ID NOT IN (SELECT ID FROM table WHERE SEX = 'F');

# all sex change
SELECT COUNT(DISTINCT ID) FROM table WHERE ID IN (SELECT ID FROM table WHERE SEX = 'M') AND ID IN (SELECT ID FROM table WHERE SEX = 'F');
SELECT DISTINCT ID FROM table WHERE ID IN (SELECT ID FROM table WHERE SEX = 'M') AND ID IN (SELECT ID FROM table WHERE SEX = 'F');

是否正确?或者是否需要窗滞后函数?

ee7vknir

ee7vknir1#

您可以尝试这样做,提前计算一些指标:

SELECT *
        ,MAX(CASE WHEN sex = 'M' THEN 1 ELSE 0 END) OVER (PARTITION BY ID) AS has_M
        ,MAX(CASE WHEN sex = 'F' THEN 1 ELSE 0 END) OVER (PARTITION BY ID) AS has_F
        ,DENSE_RANK() OVER (PARTITION BY id ORDER BY id, year) AS initial_sex
  FROM mytable;

然后解决你的问题

SELECT SUM(CASE WHEN initial_sex = 1 AND SEX = 'M' THEN 1 ELSE 0 END) 
      ,string_agg(CASE WHEN initial_sex = 1 AND SEX = 'M' THEN CAST(id AS VARCHAR(12)) END, ', ') 
      ,SUM(CASE WHEN initial_sex = 1 AND SEX = 'F' THEN 1 ELSE 0 END) 
      ,string_agg(CASE WHEN initial_sex = 1 AND SEX = 'F' THEN CAST(id AS VARCHAR(12)) END, ', ') 
      ,SUM(CASE WHEN (initial_sex = 1 AND SEX = 'F' AND has_m = 1) OR (initial_sex = 1 AND SEX = 'M' AND has_F = 1)  THEN 1 ELSE 0 END) 
      ,string_agg(CASE WHEN (initial_sex = 1 AND SEX = 'F' AND has_m = 1) OR (initial_sex = 1 AND SEX = 'M' AND has_F = 1)  THEN CAST(id AS VARCHAR(12)) END, ', ') 
FROM
(
  SELECT *
        ,MAX(CASE WHEN sex = 'M' THEN 1 ELSE 0 END) OVER (PARTITION BY ID) AS has_M
        ,MAX(CASE WHEN sex = 'F' THEN 1 ELSE 0 END) OVER (PARTITION BY ID) AS has_F
        ,DENSE_RANK() OVER (PARTITION BY id ORDER BY id, year) AS initial_sex
  FROM mytable
) DS;

下面是完整的工作示例。

lnlaulya

lnlaulya2#

假设列SEX的值只有“F”或“M”,问题A就可以解决
问题A

SELECT COUNT(DISTINCT ID) FROM table WHERE  SEX != 'F';
SELECT DISTINCT ID FROM table WHERE  SEX != 'F';
ivqmmu1c

ivqmmu1c3#

step-by-step demo: db<>fiddle
假设更改只发生一次,您可以使用first_value() window function

SELECT DISTINCT                                                                   -- 5
    id,
    CASE
        WHEN first_sex = last_sex THEN 'Stay ' || sex                             -- 3
        ELSE 'Change from ' || first_sex || ' To ' || last_sex                    -- 4    
    END sex_status
FROM (
    SELECT 
        id,
        sex,
        first_value(sex) OVER (PARTITION BY id ORDER BY year) as first_sex,       -- 1
        first_value(sex) OVER (PARTITION BY id ORDER BY year DESC) as last_sex    -- 2
    FROM mytable
) s

1.在year s内,每id获取前sex个值
1.在year s内每id获取最后sex值(注意不同的顺序:它给出从“底部”开始的第一个值)
1.比较首末如果它们相同,则返回“Stay”和sex
1.否则,返回带有性别的“Change”。(当然,您可以在这里做任何您想做的事情。添加适当的状态标识符或类似的内容,而不是纯文本,在这一点上似乎是有意义的。)

  1. DISTINCT子句将记录减少到每个id一个。
    然后你可以做任何你想做的统计。例如按GROUP BY sex_status统计不同的状态:
    demo: db<>fiddle
SELECT 
    sex_status,
    COUNT(*)
FROM (

-- query from above

) s
GROUP BY sex_status

相关问题