posted 8/24/2010 by BradSchacht - Views: [10414]
By default the functions MAX and MIN do not count NULL in their evaluation of your data. If we have a column containing only dates for instance and there is a NULL date, MAX and MIN will both ignore that value. For instance, take the following data set as an example.
WorkID StoreID EndDate----------- ----------- -----------------------1 50 NULL 2 10 1900-01-01 00:00:00.0003 20 1925-01-01 00:00:00.0004 30 1950-01-01 00:00:00.0005 40 1975-01-01 00:00:00.0006 19 2010-01-01 00:00:00.0007 34 2010-01-01 00:00:00.0008 50 2010-01-01 00:00:00.000
If we are to SELECT MAX(EndDate) From WorkSchedule we would get 1/1/2010 as the result, even when grouping by StoreID. Store 50 would bring back an EndDate of 1/1/2010. The business rules in this case say that we want to pull back the maximum end date for each store. However, if there is a NULL EndDate then the store is still being listed as active and we need to bring back NULL instead of any other dates listed. Luckily we can reqrite the query to trick MAX into picking the NULL value. Combine that with a case statement and we are good to go.
We will use COALESCE to replace any NULL EndDate with a date that is in the future that will not be coming up in our data anywhere, December 31, 2099 seems like a reasonable date for this. Next we take the MAX of the dates, which if NULL will evaluate as 12/31/2099 and be greater than any other date in our table. Wrap that in a CASE statement to replace the date 12/31/2099 back to NULL and group our data by StoreID.
SELECTStoreID,CASE WHEN MAX(COALESCE(EndDate, '12/31/2099')) = '12/31/2099' THEN NULL ELSE MAX(EndDate) END AS DateFROM WorkScheduleGROUP BY StoreID
Not the most eligant way to do things, but it defeinitely gets the job done. This can work with minimums as well, just select a very low date like 1/1/1900 if you want NULL to be the minimum. The same method can be used on regular integer fields as well substituting a number for NULL.
If you want to be able to USE AVG and other aggregate functions with NULL values this is a great way to accomplish that. You may want to average NULL values as zero so using SELECT AVG(COALESCE(column, 0)) FROM Table will accomplish that. Then any NULL will be treated as a zero.
Hello Brad,
One problem with this technique is that, as you said in your post, it counts on the fact that the data itself will never include the replacement value.I think this presents a risk as other developers can use the same improbable value (from the business point of view) to implement some special cases, and insert it in the table.
For your example, there is at least one alternative solution based on ROW_NUMBER and CTEs :
For the MIN case, we can use the behavior of the ORDER BY clause regarding NULL values : "Null values are treated as the lowest possible values."-- CODE --;WITH IntermediateDataCTE AS ( SELECT StoreID, ROW_NUMBER() OVER (PARTITION BY StoreID ORDER BY EndDate ASC) AS DateRank, EndDate AS Date FROM WorkSchedule ) SELECT StoreID, Date FROM IntermediateDataCTE WHERE DateRank = 1-- END OF CODE --
For the MAX case, it gets a bit uglier : -- CODE --;WITH IntermediateDataCTE AS ( SELECT StoreID, ROW_NUMBER() OVER (PARTITION BY StoreID ORDER BY CASE WHEN EndDate IS NULL THEN 1 ELSE 0 END DESC, EndDate DESC) AS DateRank, EndDate AS Date FROM WorkSchedule ) SELECT StoreID, Date FROM IntermediateDataCTE WHERE DateRank = 1 -- END OF CODE --
If there is any NULL value it ends up at the first position thanks to the first part of the ORDER BY expression, otherwise the greater date ends up at the first position thanks to the second part.At worst the CASE WHEN thing can be defined as a computed column expression, in order to be included in an index.
This obviously need in-situation performance testing on a case-by-case basis, and probably will be difficult to use in more complex queries, but I think it can be worth the test :-)
Gaël
Someone posted a comment for this blog on another article on BIDN by mistake. The question was
"Would
SELECT StoreID, Max(ISNULL(EndDate,'12/31/2099')) FROM WorkSchedule GROUP BY StoreID
Not work in this scenario?"
This would not work if you wanted to actually return NULL since the query above would return 12/31/2099. That is why we have to have the case statement in there to replace the 12/31/2099 with NULL.