The database:
Sales.Customers(CustomerID ,CustomerName)
Sales.Orders(OrderID, CustomerID)
Sales.OrderLines(OrderID, Quantity, UnitPrice)
Sales.Invoices(InvoiceID, OrderID, CustomerID)
Sales.InvoiceLines(InvoiceID, Quantity, UnitPrice)
The goal is to generate the following output:
column1: list CustomerID (only orders converted into invoices)
Column2: list of CustomerName
Column3: number of orders by customer ID (only orders converted into invoices)
Column4: number of invoices by customer ID
Column5: sum (Quantity*UnitPrice) of orders by CustomerID (only orders converted into invoices)
Column6: sum (Quantity*UnitPrice) of invoices by Customer ID
Column7: the difference between column 4 and column 5
I managed to join the first 4 tables and get the expected results
Select
c.CustomerID
, c.CustomerName
, COUNT(DISTINCT(o.OrderID)) AS [number orders]
, COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
, SUM(il.Quantity * il.UnitPrice) AS [TOTAL invoices]
FROM Sales.Invoices AS i
INNER JOIN Sales.InvoiceLines AS il
ON il.InvoiceID = i.InvoiceID
INNER JOIN Sales.Orders AS o
ON o.orderID = i.OrderID
INNER JOIN Sales.Customers AS c
ON c.CustomerID = i.CustomerID
GROUP BY c.CustomerID, c.CustomerName
ORDER BY c.CustomerID
But once I joined the last one, COUNT columns were still correct but both SUM columns were not returning expected numbers (1.6x from the expected SUM).
The query returning incorrect SUM:
Select
c.CustomerID
, c.CustomerName
, COUNT(DISTINCT(o.OrderID)) AS [number orders]
, COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
, SUM(DISTINCT(il.Quantity * il.UnitPrice)) AS [TOTAL invoices]
, SUM(DISTINCT(ol.Quantity * ol.UnitPrice)) AS [TOTAL orders]
FROM Sales.Invoices AS i
LEFT JOIN Sales.InvoiceLines AS il
ON il.InvoiceID = i.InvoiceID
LEFT JOIN Sales.Orders AS o
ON o.orderID = i.OrderID
LEFT JOIN Sales.OrderLines AS ol
ON ol.OrderID = o.OrderID
LEFT JOIN Sales. Customers AS c
ON c.CustomerID = i.CustomerID
GROUP BY c.CustomerID, c.CustomerName
ORDER BY c.CustomerID
I tried different types of joins, different orders, subqueries, and division queries... But always ended up with some errors or incorrect results. Any help would be highly appreciated!
3条答案
按热度按时间nukf8bse1#
To troubleshoot, remove all the group by and sums, pick one order (using a where clause) and actually look at the detail data.
This is all part of the basic problem solving process. You can't solve a problem looking at a summary you need to look at the detail.
Firstly, you have "double counting"
An order might have one invoice and three invoice lines. When you join those you get three records. That order might also have two order lines. When you join those three records to two order lines you get six (3x2) records. So now when you sum up lines or invoices you have too many records and the figure is overstated.
Secondly, this is definitely incorrect
SUM(DISTINCT(ol.Quantity * ol.UnitPrice))
If seven completely unrelated order lines have the same value. it'll throw out six of them. Which is definitely incorrect.
You can't answer the questions you need by just joining up all the tables.
You need to do a seperate subtable (or CTE) query that summarises order value by customerid then you can join on that.
ar5n3qh52#
You need to pre-aggregate orders and invoices separately. You can't aggregate them at the same time or you will get a giant cross-join.
Because you want only invoices which also have orders, you need to aggregate the orders within each invoice, then aggregate all invoices.
Throwing
DISTINCT
at something is not a solution. You need to think about why duplicates are appearing in the first place.You can also do this kind of pre-aggregation using
CROSS APPLY
orOUTER APPLY
. TheON
moves inside the subquery, and the grouping ID can be removed from theSELECT
kmbjn2e33#
In your last query, you did not relate
InvoiceLines
toOrderLines
. So effectively you are doing a cross join between these two tables. You can change your query toSELECT *
, remove all aggregates and group by clause, inspect the result and you will see it.For your requirement, it is easier to do it using derived table or cte. You perform the necessary calculation for the
Orders
orInvoices
in the derived table and then on the outer query, join it to theCustomers
table