SQL Server Calculate the average value of the N cheapest items inside a GROUP BY

66bbxpm5  于 2023-03-07  发布在  其他
关注(0)|答案(2)|浏览(141)

Using a table containing selling orders from a marketplace, defined by the items as an int, a quantity as an int, a price per units as an int and a time as a timestamp, I am trying to generate statistics on the orders.

The data look something like this:
| Item_Id | Qty | Price | Date |
| ------------ | ------------ | ------------ | ------------ |
| 100 | 5 | 4 | 2000-01-01 00:00:00.000 |
| 100 | 5 | 2 | 2000-01-01 00:00:00.000 |
| 100 | 8 | 4 | 2000-01-01 00:00:00.000 |
| 100 | 89 | 1 | 2000-01-02 00:00:00.000 |
| 100 | 44 | 5 | 2000-01-02 00:00:00.000 |

I want to calculate the average value per items to obtain N items, grouped by items and for a given period
I want to get the lowest average value possible. The result expected should be something like that for item id 100, given N = 8 and a period of 1 day.

Item_IdPeriodAvrg_Value_Of_N_Items
1002000-01-012.75 (5 units @ 2$ + 3 units @ 4$)
1002000-01-021.00 (8 units @ 1$)

I made the following query to obtain the average value of the orders, grouped by items and period

SELECT [Item_Id]
    , AVG([Price]) as [AVG_Price]
    ,DATEADD(DAY, DATEDIFF(DAY, '2020', orders.[date]) / 1 * 1, '2020') as [Date]
FROM [orders] as orders
GROUP BY [Item_Id]
    , DATEDIFF(DAY, '2020', orders.[date]) / 1
ORDER BY [Item_Id] ASC
    , [date] ASC

This is almost what I need. I now want to replace the AVG() function with my own function to calculate the value of the N cheapest items.

Option 1: Scalar-Valued Custom Function

I created a function that take as a parameter a list of orders and the number of orders to average for (N). It orders the items in ascending order of price and use a cursor to select the N cheapest items and returns their average value.

It looks something like this

CREATE FUNCTION avrg_Of_N_Cheapest_Units(@orders [dbo].[Orders] READONLY, @n int)
RETURNS INT
AS
BEGIN
    DECLARE @Qty INT = 0;
    DECLARE @Price INT = 0;
    DECLARE @OrderCount INT = 0;
    DECLARE @OrderValue INT = 0;
    DECLARE cur CURSOR FAST_FORWARD FOR 
        SELECT [Qty], [Price]
        FROM @orders
        ORDER BY [Price] ASC;

    WHILE @OrderCount < @n
    BEGIN
        FETCH NEXT FROM cur
            INTO @Qty, @Price

        IF ((@OrderCount + @Qty) < @n)
        BEGIN
            SET @OrderValue += @Qty * @Price
            SET @OrderCount += @Qty
        END
        ELSE
        BEGIN
            SET @OrderValue += (@n - @OrdersCount) * @Price
            SET @OrderCount += (@n - @OrdersCount)
        END
    END
    RETURN (@OrderValue / @OrderCount)
END;

I am however unable to come up with a query that will call this function on the sub-group made by the GROUP BY clause of a query.

I am also worried about the performance that this function might produces. Cursors aren't recommended generally and I am going to call this function a lot. It should only have to fetch few rows per calls however as I am exiting as soon as it reaches its targeted items count and doesn't go through all the rows every calls.

Option 2: CLR User-Defined Aggregates Functions

I looked into creating a custom aggregate function in C#/.Net . This seems well documented and could be an interesting prospect. These functions however requires to be parallelisable by merging the results together once they all finishes. In my case, I don't see how I could parallelize this process. If the set of cheapest N items is split between 2 processes, there's no way to know during the merging operation.

4nkexdtk

4nkexdtk1#

The easiest way to solve this is to use a tally table and multiple common table expressions. Here is an explanation of each.

1 - use a values statement to create a derived table using your data.

2 - tally tables are a list of number.

3 - explode out items given qty sold for given date, add row num

4 - select from result 3 by row count. at this point it easy to calculate aggregates since you have the correct data set. If you want top expensive products, change window function to use price descending.

;
-- 1 - user data
WITH CTE_DATA (ITEM_ID, QTY, PRICE, SOLD_DATE)
AS
(
  SELECT * FROM 
  (
  VALUES
    (100, 5, 4, '2000-01-01 00:00:00.000'),
    (100, 5, 2, '2000-01-01 00:00:00.000'),
    (100, 8, 4, '2000-01-01 00:00:00.000'),
    (100, 89, 1, '2000-01-02 00:00:00.000'),
    (100, 44, 5, '2000-01-02 00:00:00.000')
  ) 
  AS D
  (
    ITEM_ID,
    QTY,
    PRICE,
    SOLD_DATE
  ) 
),

-- 2 - small tally table
TALLY(N) AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.all_columns a --CROSS JOIN sys.all_columns b
),

-- 3 - explode by qty, rank by date, item and price
EXPLODED AS
(
SELECT 
  SOLD_DATE, ITEM_ID, PRICE, N,
  ROW_NUMBER() OVER (PARTITION BY SOLD_DATE, ITEM_ID ORDER BY (PRICE)) AS R
FROM 
  CTE_DATA AS D 
CROSS APPLY 
  TALLY AS T WHERE D.QTY >= T.N
)

-- 4 - give me first 8 items per day, item
SELECT * FROM EXPLODED WHERE R <= 8

I am including screen shots to show the intermediate steps.

The first picture shows your data.

The second picture shows the top 10 elements of the tally table.

The third picture shows the exploded data that has a row number R for ordering. We have 5 items at 2 dollars, 5 items at 4 dollars and 8 items at 4 dollars. The N value is the Nth item for random sale X line.

The last picture shows the 8 lowest price items sold for a given day and item number. It is interesting that there is no order to the data. Thus the N column might be picking items from the lot way down in the list, but it does not matter since multiple items have the same price.

Lets quickly calculate the average price.

-- average price for 8 low cost items per day and per item no
SELECT SOLD_DATE, ITEM_ID, AVG(CAST(PRICE AS FLOAT)) AS AVG_PRICE 
FROM EXPLODED WHERE R <= 8
GROUP BY SOLD_DATE, ITEM_ID

The correct results are shown below.

7y4bm7vi

7y4bm7vi2#

You can do:

with p (ord) as (select 8) -- order size
select date, 1.0 * sum(ff * price) / sum(ff) as avg_price
from (
  select x.*, case when ord >= sqty then qty else ord - sqty + qty end as ff
  from (
    select t.*, p.*,
      sum(qty) over(partition by item_id, date order by price, qty) as sqty
    from t
    cross join p
  ) x
) y
where ff > 0
group by date

Result:

date                     avg_price
-----------------------  --------------
2000-01-01 00:00:00.000  2.750000000000
2000-01-02 00:00:00.000  1.000000000000

See running example at db<>fiddle .

Note: It's strongly encouraged to add a primary key to the table. Otherwise the result may be non-deterministic.

相关问题