Joining while avoiding duplicates

az31mfrm  于 2023-02-28  发布在  其他
关注(0)|答案(2)|浏览(142)

Trying to extract data from multiple SQL tables. I have a main table and a couple of sub-tables. I want to get all the rows from the main table but this creates a duplicate record.

Edit:

Lets say , I want to display all the topics along with its status(ch_status). If two topics marked as inprogress and complete then complete will show. Here is the flow , I have Topics table which contains several topics. Now there is also a table named used_topics_chapters which will contain topic along with chapters. When lets say user start chapter , a row will be inserted into complete table which will contain progress status along with user id.

let say there are 5 rows as follows .

Then its should display as follows,

Here is a example fiddle with sample data and what I have tried so far:

Create table topics
(
  id int,
  name varchar(10)
 
)

insert into topics values(1,'Topic 1'),(2,'Topic 2'),(3,'Topic 3')

--
create table used_topics_chapters(id int,topic_id int,name varchar(10))

insert into used_topics_chapters values(1,1,'Chap1'),(2,3,'Chap 2'),(3,3,'Chap3')

-- 
create table completed
(
  id int,
  chp_id int,
  ch_status varchar(20),
  user_id int
)

insert into completed values(1,3,'complete',100),(2,2,'In-Progress',101)

select t.id,t.name,c.ch_status,
case c.ch_status when 'complete' then 0 else 1 end as can_modify
from topics t
left join used_topics_chapters as utc on t.id=utc.topic_id
left join completed as c on c.chp_id=utc.id

http://sqlfiddle.com/#!18/da7d17/2

On the output You could see that Topic 3 is showing 2 times because status is different .

What I want is , if any of the status is complete then only show that.

Ouput what I am getting ,

+----+---------+-------------+------------+
| id | name    | ch_status   | can_modify |
+----+---------+-------------+------------+
| 1  | Topic 1 | (null)      | 1          |
+----+---------+-------------+------------+
| 2  | Topic 2 | (null)      | 1          |
+----+---------+-------------+------------+
| 3  | Topic 3 | In-Progress | 1          |
+----+---------+-------------+------------+
| 3  | Topic 3 | complete    | 0          |
+----+---------+-------------+------------+

Output what I am expecting ,

+----+---------+-----------+------------+
| id | name    | ch_status | can_modify |
+----+---------+-----------+------------+
| 1  | Topic 1 | (null)    | 1          |
+----+---------+-----------+------------+
| 2  | Topic 2 | (null)    | 1          |
+----+---------+-----------+------------+
| 3  | Topic 3 | complete  | 0          |
+----+---------+-----------+------------+
sigwle7e

sigwle7e1#

You want to show one result row per topic, so select from the topics table. You want to know whether a 'complete' row exists for the topic, so use EXISTS . As both result columns can_modify and ch_status contain the same information, namely whether a 'complete' row exists or not, I am using a subquery here in order not to have to write the same condition twice:

select
  id, name, can_modify,
  case when can_modify = 0 then 'Complete' end as ch_status
from
(
  select
    t.id, t.name,
    case when exists
    (
      select null
      from used_topics_chapters as utc
      join completed as c on c.chp_id = utc.id
      where utc.topic_id = t.id
      and c.ch_status = 'complete'
    ) then 0 else 1 end as can_modify
  from topics t
) evaluated
order by id;
oyjwcjzk

oyjwcjzk2#

There are no duplicates in the question's results. It seems the actual problem is how to get the "latest" row per topic. Rows have no implicit order or rank, so the query itself will have to specify one.

One way is to use ranking functions like ROW_NUMBER() to order result by descending status order and select the first row per topic. This means that the status values need to have some logical order. Either the values themselves need to be in logical order, or the table where they're stored in needs to have an order imposed through eg the ID or an ordering column.

In this case Complete comes before In-Progress' lexically so we can use an ascending ORDER BY` to get the "last" logical status. In a more complex example though, a proper table and order would be needed.

with x as (
    select t.id,t.name,c.ch_status,
           IIF(c.ch_status ='complete',0 ,1) as can_modify,
           ROW_NUMBER(PARTITION BY t.ID ORDER BY status) as RN
    from topics t
        left join used_topics_chapters as utc on t.id=utc.topic_id
        left join completed as c on c.chp_id=utc.id
)
select ID,Name,ch_status,can_modify
FROM x
WHERE RN=1

A CTE is needed because ranking functions can't be used in the WHERE clause.

A Table value constructor could be used to create a "table" of statuses right inside the query :

with x as (
    select t.id,t.name,c.ch_status,
           s.can_modify,
           ROW_NUMBER(PARTITION BY t.ID ORDER BY s.Priority DESC) as RN
    from topics t
        left join used_topics_chapters as utc on t.id=utc.topic_id
        left join completed as c on c.chp_id=utc.id
        left join (VALUES (1,'In-Progress',1),
                          (2,'complete',0)
                  ) as s(Priority,status,can_modify)
            ON ch_status=s.status

)
select ID,Name,ch_status,can_modify
FROM x
WHERE RN=1

This allows specifying more than one status values and extra attributes per status

相关问题