Overview
Given the following T-SQL Query, is it possible to use PERCENT_RANK
or a similar function do get the desired percentile? If not, what is the simplest way to accomplish this goal?
SELECT rank,
ROUND(
100 * PERCENT_RANK()
OVER (ORDER BY rank DESC),
2) as pctile,
desired_pctile
FROM ( VALUES (1, 100), (1, 100), (1, 100), (2, 62.5), (3, 50), (3, 50), (4, 25), (5, 12.5) ) as X(rank, desired_pctile)
rank | pctile | desired_pctile |
---|---|---|
5 | 0 | 12.5 |
4 | 14.29 | 25 |
3 | 28.57 | 50 |
3 | 28.57 | 50 |
2 | 57.14 | 62.5 |
1 | 71.43 | 100 |
1 | 71.43 | 100 |
1 | 71.43 | 100 |
Details
Specifically, the pctile and desired_pctile calculations are defined slightly differently.
- pctile: the percentage of other records with a rank less than the current record's rank.
- desired_pctile: the percentage of records with a rank less than or equal to the current record's rank.
This means that there are two distinctions:
- When multiple records share the same rank, the desired behavior is to take the maximum instead of the minimum (see Python for an example).
- The denominator is N-1, where having N is desired. Where N is the record count. Is there a way to multiply pctile by
COUNT(*) / (COUNT(*)-1)
in the query?
Python Equivalent
In pandas the rank function has a parameter method
for specifying how to handle multiple records that share the same value. The following python code obtains the desired result:
pd.Series([1,1,1,2,3,3,4,5]).rank(method='max', ascending=False, pct=True)
1条答案
按热度按时间j2qf4p5b1#
Here is a solution that calculates a windowed form of
1 - (rank() - 1) / count()
, scaled and rounded to percentages. This is a slight improvement over what I originally posted in my comment, as it eliminates the duplicate count subexpression.Results:
See this db<>fiddle .