postgresql DISTINCT在两个单独的列而不是两列的组上

rjzwgtxy  于 2023-06-22  发布在  PostgreSQL
关注(0)|答案(1)|浏览(163)

表:Campaign_mapping
| 活动ID|电子邮件ID_ID|
| - -----|- -----|
| 1| 1|
| 1| 2|
| 1| 3|
| 2| 2|
| 4| 4|
| 5个|4|
| 六|5个|
| 七个|七个|
| 八|六|
表:Leads
| id|活动ID|电子邮件ID_ID|数据_a|数据_B|数据_c|
| - -----|- -----|- -----|- -----|- -----|- -----|
| 2| 1||一种|B| c型|
| 3| 2| 2| a1| b2| C3|
| 4| 3||a2| b3|碳四|
| 5个|4||A5| b5| C2|
| 六|5个||a1| B| c型|
| 七个|六||一种|b2| c型|
| 八|2||A3| B|碳四|
| 九个|1| 3|一种|b2|碳四|
| 十个|1||一种|b1| c型|
| 十一|2||a2| B| c型|
| 十二岁|3||一种|B|碳四|
| 十三个|4||一种|b2| c型|
| 十四|5个||a1| B| c型|
试验查询:

SELECT 
    Campaign_mapping.*, 
    Leads.* 
FROM 
    Campaign_mapping 
JOIN 
    Leads ON Campaign_mapping.campaign_id = Leads.campaign_id 
WHERE 
    (Campaign_mapping.emailid_id = Leads.emailid_id 
     OR Leads.emailid_id IS NULL) 
    AND Leads.id = (SELECT tblA.id FROM Leads tblA 
                    WHERE Campaign_mapping.campaign_id = tblA.campaign_id 
                    LIMIT 1);

电流输出:
| 活动ID|电子邮件ID_ID| id|活动ID|电子邮件ID_ID|数据_a|数据_B|数据_c|
| - -----|- -----|- -----|- -----|- -----|- -----|- -----|- -----|
| 1| 1| 2| 1||一种|B| c型|
| 1| 2| 2| 1||一种|B| c型|
| 1| 3| 2| 1||一种|B| c型|
| 2| 2| 3| 2| 2| a1| b2| C3|
| 4| 4| 5个|4||A5| b5| C2|
| 5个|4|六|5个||a1| B| c型|
| 六|5个|七个|六||一种|b2| c型|
一种可能的解决方案(需要更好的输出或最大行的随机化):

SELECT DISTINCT ON (id) * FROM (
SELECT DISTINCT ON (campaign_mapping.emailid_id) * FROM campaign_mapping JOIN Leads ON campaign_mapping.campaign_id = Leads.campaign_id WHERE (campaign_mapping.emailid_id = Leads.emailid_id OR Leads.emailid_id is NULL)

输出(需要更好):
| 活动ID|电子邮件ID_ID| id|活动ID|电子邮件ID_ID|数据_a|数据_B|数据_c|
| - -----|- -----|- -----|- -----|- -----|- -----|- -----|- -----|
| 1| 1| 2| 1|空值|一种|B| c型|
| 4| 4| 5个|4|空值|A5| b5| C2|
| 六|5个|七个|六|空值|一种|b2| c型|
最终预期输出应为:
| 活动ID|电子邮件ID_ID| id|活动ID|电子邮件ID_ID|数据_a|数据_B|数据_c|
| - -----|- -----|- -----|- -----|- -----|- -----|- -----|- -----|
| 1| 1| 2| 1||一种|B| c型|
| 1| 2|十个|1||一种|b1| c型|
| 1| 3|九个|1| 3|一种|b2|碳四|
| 4| 4| 5个|4||A5| b5| C2|
| 六|5个|七个|六||一种|b2| c型|
简而言之,我需要唯一的emailid_id和唯一的Leads.id沿着其他行的值。因此,c.emailid_id在所有行中应该是唯一的,www.example.com也是如此Leads.id。

请注意:Campaign_mapping并不完全是一个表,而是一个带有过滤器的复杂连接,最终给出了这个输出,如表所示。

以下是SQL查询,任何愿意帮助我解决它的人:

CREATE TABLE campaign_mapping (
    campaign_id int,
    emailid_id int
);

INSERT INTO campaign_mapping VALUES (1 , 1);
INSERT INTO campaign_mapping VALUES (1 , 2);
INSERT INTO campaign_mapping VALUES (1 , 3);
INSERT INTO campaign_mapping VALUES (2 , 2);
INSERT INTO campaign_mapping VALUES (4 , 4);
INSERT INTO campaign_mapping VALUES (5 , 4);
INSERT INTO campaign_mapping VALUES (6 , 5);
INSERT INTO campaign_mapping VALUES (7 , 7);
INSERT INTO campaign_mapping VALUES (8 , 6);


CREATE TABLE leads (
    id int,
    campaign_id int,
    emailid_id int,
    data_a text,
    data_b text,
    data_c text
);
    
INSERT INTO leads VALUES (2 , 1, null, 'a', 'b', 'c');
INSERT INTO leads VALUES (3 , 2, 2, 'a1', 'b2', 'c3' );
INSERT INTO leads VALUES (4 , 3, null, 'a2', 'b3', 'c4');
INSERT INTO leads VALUES (5 , 4, null, 'a5', 'b5', 'c2');
INSERT INTO leads VALUES (6 , 5, null, 'a1', 'b', 'c');
INSERT INTO leads VALUES (7 , 6, null, 'a', 'b2', 'c');
INSERT INTO leads VALUES (8 , 2, null, 'a3', 'b', 'c4');
INSERT INTO leads VALUES (9 , 1, 3, 'a', 'b2', 'c4');
INSERT INTO leads VALUES (10 , 1, null, 'a', 'b1', 'c');
INSERT INTO leads VALUES (11 , 2, null, 'a2', 'b', 'c');
INSERT INTO leads VALUES (12 , 3, null, 'a', 'b', 'c4');
INSERT INTO leads VALUES (13 , 4, null, 'a', 'b2', 'c');
INSERT INTO leads VALUES (14 , 5, null, 'a1', 'b', 'c');

我更喜欢一个连接的解决方案,因为Campaign_mapping中的行数可以超过2000,但是Where IN和应用层的两个单独查询的小处理也可以接受。

9rnv2umw

9rnv2umw1#

最后终于能够解决这个使用。

SELECT DISTINCT ON (id) * FROM (SELECT DISTINCT ON (campaign_mapping.emailid_id) campaign_mapping.*, leads.* FROM campaign_mapping JOIN Leads ON campaign_mapping.campaign_id = Leads.campaign_id WHERE (campaign_mapping.emailid_id = Leads.emailid_id OR Leads.emailid_id is NULL) ORDER BY campaign_mapping.emailid_id, RANDOM()) ORDER BY id, RANDOM();

如果有人得到了一个更好的优化方案(给出最大可能行,或者随机化最大可能行),而没有random()性能的影响,我会对你深表感谢。
注意:**DISTINCT ON()**是一个postgres结构,在cocroachDB上支持,这可能不适用于其他SQL解决方案。

相关问题