The last couple of years in the US have been marked by many discussions of numbers, to be more specific, percentages. It started with the 99% protests and, most recently, has included the unfortunate 47% comment by one of our presidential candidates. But, when I think about it, I remember being aware in grade school of having a certain percentile value assigned to each of my subject areas. I was really proud of hitting above 90% in math and reading skills for my grade. I also remember seeing percentile rankings on my college and grad school admission tests. Most recently, I have been aware of my six year old grandson’s height ranking. He is in the 90th percentile in height for his age. If he maintains that ranking, he will be taller than I am in three or four years.

What do these percentiles really mean? Let’s take a look at the height value. This 90% rank means that, of all six year olds, my grandson Thomas is taller than 90% of them. If we had 100 random kids and lined them up in order from shortest to tallest, Thomas would be standing in approximately spot number 90. For the pediatric height charts, someone did the calculations years ago, and pediatricians and parents just find the point on the chart that represents the particular height.

The formula for figuring out the percentile rank is the following (rk -1)/(rn -1) where rk equals the rank of the value and rn equals the count of the items. (NOTE: Formulas in this article from "Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions" by Itzik Ben-Gan).

Let's look at a very simple example:

Value | Percentage Rank |

1 | 0 |

2 | 11.11 |

3 | 22.22 |

4 | 33.33 |

5 | 44.44 |

6 | 55.56 |

7 | 66.67 |

8 | 77.78 |

9 | 88.89 |

10 | 100.00 |

In SQL Server 2005, Microsoft gave us the RANK and DENSE_RANK functions. In SQL Server 2012, we now have the PERCENT_RANK function. This function works similarly to RANK, but instead of giving the rank value, gives a percentage ranking. So now you can easily calculate the percent rank for each row in a result set. Here is a query using PERCENT_RANK:

SELECT COUNT(*) NumberOfOrders, Month(OrderDate) AS OrderMonth,

PERCENT_RANK() OVER(ORDER BY COUNT(*)) * 100

FROM Sales.SalesOrderHeader

GROUP BY Month(OrderDate)

In this query, I am using COUNT(*) to get a count of the orders by month. The PERCENT_RANK function uses the OVER clause to specify the order of the ranking. Here are the results:

I can also partition the results. The following query groups the data by order month and year and then figures out the percent ranking of each month’s sales.

SELECT COUNT(*) NumberOfOrders, Month(OrderDate) AS OrderMonth,

YEAR(OrderDate) AS OrderYear,

PERCENT_RANK() OVER(PARTITION BY YEAR(OrderDate) ORDER BY COUNT(*)) * 100

FROM Sales.SalesOrderHeader

GROUP BY Month(OrderDate),YEAR(OrderDate)

ORDER BY Year(OrderDate)

Using PARTITION BY allows me to calculate the percentage rank for the months of each year. Here is a sample of the data:

Another similar function added in 2012 is the CUME_DIST function for calculating the cumulative distribution. It is very similar to PERCENT_RANK. The difference is that the cumulative distribution includes the item. The formula for CUME_DIST is rk/rn or the rank divided by the number of items. Use the CUME_DIST function just like you do the PERCENT_RANK function.

Using our simple data set, here is how percentage rank compares to cumulative distribution:

Value | Percentage Rank | Cumulative Distribution |

1 | 0 | 10.0 |

2 | 11.11 | 20.0 |

3 | 22.22 | 30.0 |

4 | 33.33 | 40.0 |

5 | 44.44 | 50.0 |

6 | 55.56 | 60.0 |

7 | 66.67 | 70.0 |

8 | 77.78 | 80.0 |

9 | 88.89 | 90.0 |

10 | 100.0 | 100.0 |

There are obviously many uses for these new functions. They are just another glimpse of the T-SQL goodness that is SQL Server 2012!