Using a table containing selling orders from a marketplace, defined by the items as an int, a quantity as an int, a price per units as an int and a time as a timestamp, I am trying to generate statistics on the orders.
The data look something like this:
| Item_Id | Qty | Price | Date |
| ------------ | ------------ | ------------ | ------------ |
| 100 | 5 | 4 | 2000-01-01 00:00:00.000 |
| 100 | 5 | 2 | 2000-01-01 00:00:00.000 |
| 100 | 8 | 4 | 2000-01-01 00:00:00.000 |
| 100 | 89 | 1 | 2000-01-02 00:00:00.000 |
| 100 | 44 | 5 | 2000-01-02 00:00:00.000 |
I want to calculate the average value per items to obtain N items, grouped by items and for a given period
I want to get the lowest average value possible. The result expected should be something like that for item id 100, given N = 8 and a period of 1 day.
Item_Id | Period | Avrg_Value_Of_N_Items |
---|---|---|
100 | 2000-01-01 | 2.75 (5 units @ 2$ + 3 units @ 4$) |
100 | 2000-01-02 | 1.00 (8 units @ 1$) |
I made the following query to obtain the average value of the orders, grouped by items and period
SELECT [Item_Id]
, AVG([Price]) as [AVG_Price]
,DATEADD(DAY, DATEDIFF(DAY, '2020', orders.[date]) / 1 * 1, '2020') as [Date]
FROM [orders] as orders
GROUP BY [Item_Id]
, DATEDIFF(DAY, '2020', orders.[date]) / 1
ORDER BY [Item_Id] ASC
, [date] ASC
This is almost what I need. I now want to replace the AVG() function with my own function to calculate the value of the N cheapest items.
Option 1: Scalar-Valued Custom Function
I created a function that take as a parameter a list of orders and the number of orders to average for (N). It orders the items in ascending order of price and use a cursor to select the N cheapest items and returns their average value.
It looks something like this
CREATE FUNCTION avrg_Of_N_Cheapest_Units(@orders [dbo].[Orders] READONLY, @n int)
RETURNS INT
AS
BEGIN
DECLARE @Qty INT = 0;
DECLARE @Price INT = 0;
DECLARE @OrderCount INT = 0;
DECLARE @OrderValue INT = 0;
DECLARE cur CURSOR FAST_FORWARD FOR
SELECT [Qty], [Price]
FROM @orders
ORDER BY [Price] ASC;
WHILE @OrderCount < @n
BEGIN
FETCH NEXT FROM cur
INTO @Qty, @Price
IF ((@OrderCount + @Qty) < @n)
BEGIN
SET @OrderValue += @Qty * @Price
SET @OrderCount += @Qty
END
ELSE
BEGIN
SET @OrderValue += (@n - @OrdersCount) * @Price
SET @OrderCount += (@n - @OrdersCount)
END
END
RETURN (@OrderValue / @OrderCount)
END;
I am however unable to come up with a query that will call this function on the sub-group made by the GROUP BY clause of a query.
I am also worried about the performance that this function might produces. Cursors aren't recommended generally and I am going to call this function a lot. It should only have to fetch few rows per calls however as I am exiting as soon as it reaches its targeted items count and doesn't go through all the rows every calls.
Option 2: CLR User-Defined Aggregates Functions
I looked into creating a custom aggregate function in C#/.Net . This seems well documented and could be an interesting prospect. These functions however requires to be parallelisable by merging the results together once they all finishes. In my case, I don't see how I could parallelize this process. If the set of cheapest N items is split between 2 processes, there's no way to know during the merging operation.
2条答案
按热度按时间4nkexdtk1#
The easiest way to solve this is to use a tally table and multiple common table expressions. Here is an explanation of each.
1 - use a values statement to create a derived table using your data.
2 - tally tables are a list of number.
3 - explode out items given qty sold for given date, add row num
4 - select from result 3 by row count. at this point it easy to calculate aggregates since you have the correct data set. If you want top expensive products, change window function to use price descending.
I am including screen shots to show the intermediate steps.
The first picture shows your data.
The second picture shows the top 10 elements of the tally table.
The third picture shows the exploded data that has a row number R for ordering. We have 5 items at 2 dollars, 5 items at 4 dollars and 8 items at 4 dollars. The N value is the Nth item for random sale X line.
The last picture shows the 8 lowest price items sold for a given day and item number. It is interesting that there is no order to the data. Thus the N column might be picking items from the lot way down in the list, but it does not matter since multiple items have the same price.
Lets quickly calculate the average price.
The correct results are shown below.
7y4bm7vi2#
You can do:
Result:
See running example at db<>fiddle .
Note: It's strongly encouraged to add a primary key to the table. Otherwise the result may be non-deterministic.