Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

Framing: ROWS vs. RANGE

  • 6 December 2013
  • Author: Kathi Kellenberger
  • Number of views: 7575
  • 0 Comments

I have spent a lot of time with Window function this year. I have found that many people haven’t heard about all the features introduced with 2005, or if they have, don’t know the details to get the most out of using them. That led me to creating a session on Window functions and presenting it at many SQL Saturday events, user groups, and PASS Summit 2013 as well as writing many blog posts. I have found that this topic is like an onion, and I keep peeling off layers and learning more.

One interesting aspect is framing which was introduced in 2012. This gives you additional control over the window definition (the OVER clause) that the window function operates on. This lets you do things like calculating running totals or pulling values from a previous row.

The default frame is RANGE BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW. This means for each row, perform the calculation on the rows leading up to and including the current row. If you wish to calculate a running total, you actually get better performance using ROWS instead of RANGE.

There is another difference between RANGE and ROWS. I’ve seen the difference explained as ROWS being physical while RANGE is logical. For the running total, this means that when there is a tie in the values, the calculation will be different.

This query is actually using RANGE BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW since that is the default.

SELECT TotalDue, SUM(TotalDue) OVER(Partition By CustomerID order by OrderDate) AS RunningTotal ,
 CustomerID, OrderDate
FROM Sales.SalesOrderHeader;


 
As mentioned, ROWS will have much better performance, but RANGE will return results that you may not expect if there are any duplicates in the ORDER BY option values. Running this query, I found some customers that have multiple orders on the same day:

SELECT COUNT(*) , CustomerID, OrderDate
FROM Sales.SalesOrderHeader
GROUP BY CustomerID, OrderDate
HAVING COUNT(*) > 1;

 

 

Notice in this query that the value calculated is the same when the OrderDate is the same.

SELECT TotalDue, SUM(TotalDue) OVER(Partition By CustomerID order by OrderDate) AS RunningTotal ,
 CustomerID, OrderDate
FROM Sales.SalesOrderHeader
WHERE CustomerID in (29837,11276,11300,12335,11276);

 

Anytime there is a tie in the ORDER BY column values, the same calculation is made on those rows. If I specify ROWS instead of RANGE, I get the results I expect, increasing in value with each row.

SELECT TotalDue, SUM(TotalDue) OVER(Partition By CustomerID order by OrderDate
 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal ,
 CustomerID, OrderDate
FROM Sales.SalesOrderHeader
WHERE CustomerID in (29837,11276,11300,12335,11276);


 
 
The lesson here is that you should be sure to specify the frame clause when it is supported. By not doing so, you might not get the results you expect.

Print
Categories: SQL Server
Tags:
Rate this article:
No rating

Please login or register to post comments.