Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

Two more analytical functions: PERCENTILE_CONT and PERCENTILE_DISC

  • 10 December 2012
  • Author: Kathi Kellenberger
  • Number of views: 13779
  • 0 Comments

I’ve been blogging a lot lately about the new analytic functions introduced with SQL Server 2012. I have two more to talk about: PERCENTILE_DISC and PERCENTILE_CONT. Previously, I covered CUME_DIST and PERCENT_RANK. Those two functions give you a percent ranking compared to a series of values. The PERCENTILE_DISC and PERCENTILE_CONT functions do the opposite. Given a series of values, which value is located at a certain percent rank? The most common example would be to find the median or the value at 50%.

So, how do you know which one of these to use? They have an interesting difference. PERCENTILE_DISC will return one of the actual values while PERCENTILE_CONT will return a calculated value – not necessarily an actual value.

The syntax is also a bit different than anything we have seen before. There is a new clause, WITHIN GROUP. From Books Online, here is the syntax:
         
PERCENTILE_CONT ( numeric_literal )
    WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] )
    OVER ( [ ] )
         
PERCENTILE_DISC ( numeric_literal )
    WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] )
    OVER ( [ ] )

The first value, the “numeric literal” is the percent, expressed as a number between 0 and 1. In the WITHIN GROUP (ORDER BY) clause, specify the value on which we will base the ranking. This must be a numeric column, and only one expression is allowed here. Finally, in the OVER clause, specify the partitions if any.

Let’s take a look at an example using both functions:

SELECT DISTINCT CustomerID, COUNT(*) OVER( PARTITION BY CustomerID) AS OrderCount,
       PERCENTILE_CONT(.5) WITHIN GROUP (ORDER BY TotalDue)
       OVER (PARTITION BY CustomerID) AS PC,
       PERCENTILE_DISC(.5) WITHIN GROUP (ORDER BY TotalDue)
       OVER (PARTITION BY CustomerID) AS PD
FROM Sales.SalesOrderHeader; 

Here are some of the results:

 

You’ll notice that sometimes the two calculated values are the same, and sometimes they are different. When the number of orders is odd, the functions return the middle value. When the number of orders for a customer is even, the PERCENTILE_DISC chooses one of the values close to the actual median. The PERCENTILE_CONT will calculate the value, which is the average of the two middle values.

For example, CustomerID 11431 has four orders. The PERCENTILE_CONT value is the average of the second and third order.

SELECT TotalDue
FROM Sales.SalesOrderHeader
WHERE CustomerID = 11431
ORDER BY TotalDue;

SELECT (2613.3029 + 2649.8453)/2;

 

These functions will be useful when looking at data such as scores or population data. The new analytic functions are easy to use, so keep them in mind for your challenging T-SQL dilemmas.

       
             
       
             

 

 

Print
Categories: SQL Server
Tags:
Rate this article:
3.0

Please login or register to post comments.