SQL Server Query to count unique companies for each year within projects that involve collaboration from two or more companies

fkvaft9z  于 2024-01-05  发布在  其他
关注(0)|答案(1)|浏览(120)

The dataset looks like this:
| Project | Company | start year | end year |
| ------------ | ------------ | ------------ | ------------ |
| Project 1 | Company A | 2015 | 2020 |
| Project 1 | Company A | 2015 | 2020 |
| Project 1 | Company B | 2015 | 2020 |
| Project 1 | Company C | 2015 | 2020 |
| Project 2 | Company D | 2016 | 2021 |
| Project 2 | Company D | 2016 | 2021 |
| Project 2 | Company A | 2016 | 2019 |
| Project 3 | Company E | 2019 | 2023 |
| Project 3 | Company E | 2019 | 2023 |
| Project 3 | Company E | 2019 | 2023 |

I am trying to get a number of unique companies (as they can be part of several projects) for each year, when they are part of a project with collaboration of at least two companies.

And those companies should be counted for each year they are in those projects.

So the result would be:
| year | unique companies |
| ------------ | ------------ |
| 2015 | 248 |
| 2016 | 279 |
| 2017 | 423 |

etc.

I have this data in Excel, haven't found any way to count it (work with Excel 2016).

I tried SQL Server Management Studio. My knowledge of SQL is not so wide, so I asked ChatGPT to help me with the query, we had quite a conversation and tried over 20 queries (as I explained again and shared the errors or unwanted results).

One of the queries we tried is this one. But the result was I believe only counting the company in the year the project started, as some years have very low numbers in the result.

WITH ProjectCompanyCTE AS 
(
    SELECT
        [ID projektu] AS Project_ID,
        [Ičo] AS Company_ID,
        [Rok zahájení] AS Year
    FROM
        dbo.IsvavaiProjects
    WHERE
        [ID projektu] IN (SELECT [ID projektu]
                          FROM dbo.IsvavaiProjects
                          GROUP BY [ID projektu]
                          HAVING COUNT(DISTINCT [Ičo]) >= 2)
)
SELECT
    Year,
    COUNT(DISTINCT Company_ID) AS UniqueCompaniesCount
FROM
    ProjectCompanyCTE
GROUP BY
    Year
ORDER BY
    Year;
jslywgbw

jslywgbw1#

You first need to start with a table of years (you can use GENERATE_SERIES or some other method) and join your table to it. That splits out each row into its constituent years.

Then use a window function to count the number of companies per project. And simply group up by year and count distinct CompanyIDs, filtering the window function to 2 or more.

SELECT
  ip.Year,
  UniqueCompanies = COUNT(DISTINCT ip.CompanyID)
FROM (
    SELECT 
      Year = year.value,
      ip.Project_ID,
      ip.CompanyID,
      CompaniesPerProject = COUNT(*) OVER (PARTITION BY year.value, ip.Project_ID)
    FROM GENERATE_SERIES(2015, 2023) year
    JOIN dbo.IsvavaiProjects ip
      ON year.value BETWEEN ip.StartYear AND ip.EndYear
) ip
WHERE ip.CompaniesPerProject >= 2
GROUP BY
  ip.Year;

相关问题