Customizing percentile calculation in SQL Server

mfuanj7w  于 2023-02-18  发布在  SQL Server
关注(0)|答案(1)|浏览(116)

Overview

Given the following T-SQL Query, is it possible to use PERCENT_RANK or a similar function do get the desired percentile? If not, what is the simplest way to accomplish this goal?

SELECT rank, 
ROUND(
    100 * PERCENT_RANK() 
    OVER (ORDER BY rank DESC), 
2) as pctile,
desired_pctile

FROM ( VALUES (1, 100), (1, 100), (1, 100), (2, 62.5), (3, 50), (3, 50), (4, 25), (5, 12.5) ) as X(rank, desired_pctile)
rankpctiledesired_pctile
5012.5
414.2925
328.5750
328.5750
257.1462.5
171.43100
171.43100
171.43100

Details

Specifically, the pctile and desired_pctile calculations are defined slightly differently.

  • pctile: the percentage of other records with a rank less than the current record's rank.
  • desired_pctile: the percentage of records with a rank less than or equal to the current record's rank.

This means that there are two distinctions:

  1. When multiple records share the same rank, the desired behavior is to take the maximum instead of the minimum (see Python for an example).
  2. The denominator is N-1, where having N is desired. Where N is the record count. Is there a way to multiply pctile by COUNT(*) / (COUNT(*)-1) in the query?

Python Equivalent

In pandas the rank function has a parameter method for specifying how to handle multiple records that share the same value. The following python code obtains the desired result:

pd.Series([1,1,1,2,3,3,4,5]).rank(method='max', ascending=False, pct=True)
j2qf4p5b

j2qf4p5b1#

Here is a solution that calculates a windowed form of 1 - (rank() - 1) / count() , scaled and rounded to percentages. This is a slight improvement over what I originally posted in my comment, as it eliminates the duplicate count subexpression.

SELECT
    rank,
    ROUND(100 - 100.0 * (RANK() OVER (ORDER BY rank) - 1)
                      / (COUNT(*) OVER())
        , 2) as pctile, 
    desired_pctile
FROM (
    VALUES
        (1, 100), (1, 100), (1, 100), (2, 62.5),
        (3, 50), (3, 50), (4, 25), (5, 12.5)
) as X(rank, desired_pctile)

Results:

rankpctiledesired_pctile
1100.000000000000100.0
1100.000000000000100.0
1100.000000000000100.0
262.50000000000062.5
350.00000000000050.0
350.00000000000050.0
425.00000000000025.0
512.50000000000012.5

See this db<>fiddle .

相关问题