Joining while avoiding duplicates

az31mfrm  于 2023-02-28  发布在  其他
关注(0)|答案(2)|浏览(236)

Trying to extract data from multiple SQL tables. I have a main table and a couple of sub-tables. I want to get all the rows from the main table but this creates a duplicate record.

Edit:

Lets say , I want to display all the topics along with its status(ch_status). If two topics marked as inprogress and complete then complete will show. Here is the flow , I have Topics table which contains several topics. Now there is also a table named used_topics_chapters which will contain topic along with chapters. When lets say user start chapter , a row will be inserted into complete table which will contain progress status along with user id.

let say there are 5 rows as follows .

Then its should display as follows,

Here is a example fiddle with sample data and what I have tried so far:

  1. Create table topics
  2. (
  3. id int,
  4. name varchar(10)
  5. )
  6. insert into topics values(1,'Topic 1'),(2,'Topic 2'),(3,'Topic 3')
  7. --
  8. create table used_topics_chapters(id int,topic_id int,name varchar(10))
  9. insert into used_topics_chapters values(1,1,'Chap1'),(2,3,'Chap 2'),(3,3,'Chap3')
  10. --
  11. create table completed
  12. (
  13. id int,
  14. chp_id int,
  15. ch_status varchar(20),
  16. user_id int
  17. )
  18. insert into completed values(1,3,'complete',100),(2,2,'In-Progress',101)
  19. select t.id,t.name,c.ch_status,
  20. case c.ch_status when 'complete' then 0 else 1 end as can_modify
  21. from topics t
  22. left join used_topics_chapters as utc on t.id=utc.topic_id
  23. left join completed as c on c.chp_id=utc.id

http://sqlfiddle.com/#!18/da7d17/2

On the output You could see that Topic 3 is showing 2 times because status is different .

What I want is , if any of the status is complete then only show that.

Ouput what I am getting ,

  1. +----+---------+-------------+------------+
  2. | id | name | ch_status | can_modify |
  3. +----+---------+-------------+------------+
  4. | 1 | Topic 1 | (null) | 1 |
  5. +----+---------+-------------+------------+
  6. | 2 | Topic 2 | (null) | 1 |
  7. +----+---------+-------------+------------+
  8. | 3 | Topic 3 | In-Progress | 1 |
  9. +----+---------+-------------+------------+
  10. | 3 | Topic 3 | complete | 0 |
  11. +----+---------+-------------+------------+

Output what I am expecting ,

  1. +----+---------+-----------+------------+
  2. | id | name | ch_status | can_modify |
  3. +----+---------+-----------+------------+
  4. | 1 | Topic 1 | (null) | 1 |
  5. +----+---------+-----------+------------+
  6. | 2 | Topic 2 | (null) | 1 |
  7. +----+---------+-----------+------------+
  8. | 3 | Topic 3 | complete | 0 |
  9. +----+---------+-----------+------------+
sigwle7e

sigwle7e1#

You want to show one result row per topic, so select from the topics table. You want to know whether a 'complete' row exists for the topic, so use EXISTS . As both result columns can_modify and ch_status contain the same information, namely whether a 'complete' row exists or not, I am using a subquery here in order not to have to write the same condition twice:

  1. select
  2. id, name, can_modify,
  3. case when can_modify = 0 then 'Complete' end as ch_status
  4. from
  5. (
  6. select
  7. t.id, t.name,
  8. case when exists
  9. (
  10. select null
  11. from used_topics_chapters as utc
  12. join completed as c on c.chp_id = utc.id
  13. where utc.topic_id = t.id
  14. and c.ch_status = 'complete'
  15. ) then 0 else 1 end as can_modify
  16. from topics t
  17. ) evaluated
  18. order by id;
展开查看全部
oyjwcjzk

oyjwcjzk2#

There are no duplicates in the question's results. It seems the actual problem is how to get the "latest" row per topic. Rows have no implicit order or rank, so the query itself will have to specify one.

One way is to use ranking functions like ROW_NUMBER() to order result by descending status order and select the first row per topic. This means that the status values need to have some logical order. Either the values themselves need to be in logical order, or the table where they're stored in needs to have an order imposed through eg the ID or an ordering column.

In this case Complete comes before In-Progress' lexically so we can use an ascending ORDER BY` to get the "last" logical status. In a more complex example though, a proper table and order would be needed.

  1. with x as (
  2. select t.id,t.name,c.ch_status,
  3. IIF(c.ch_status ='complete',0 ,1) as can_modify,
  4. ROW_NUMBER(PARTITION BY t.ID ORDER BY status) as RN
  5. from topics t
  6. left join used_topics_chapters as utc on t.id=utc.topic_id
  7. left join completed as c on c.chp_id=utc.id
  8. )
  9. select ID,Name,ch_status,can_modify
  10. FROM x
  11. WHERE RN=1

A CTE is needed because ranking functions can't be used in the WHERE clause.

A Table value constructor could be used to create a "table" of statuses right inside the query :

  1. with x as (
  2. select t.id,t.name,c.ch_status,
  3. s.can_modify,
  4. ROW_NUMBER(PARTITION BY t.ID ORDER BY s.Priority DESC) as RN
  5. from topics t
  6. left join used_topics_chapters as utc on t.id=utc.topic_id
  7. left join completed as c on c.chp_id=utc.id
  8. left join (VALUES (1,'In-Progress',1),
  9. (2,'complete',0)
  10. ) as s(Priority,status,can_modify)
  11. ON ch_status=s.status
  12. )
  13. select ID,Name,ch_status,can_modify
  14. FROM x
  15. WHERE RN=1

This allows specifying more than one status values and extra attributes per status

展开查看全部

相关问题