posted 6/25/2010 by kylewalker - Views: [4828]
I was reading a great blog yesterday, posted by Keith Hyer, on Identifying missing value ranges in numeric sequences. His method used a common table expression (CTE) with a series of variables, followed by a simple SELECT statement. Then your output is a 2 column result set displaying the ranges of missing values in a given numeric sequence, such as an identity column on a table. Quite honestly, I can't think of a better method that displays missing values in a range format. However, I started thinking about it, and realized that it might also be helpful to have a single column listing of your missing values instead of a column with a value for the "begin range" and another column with a value for the "end range". So here's another approach, with just a slightly different output format.
First we need a table that has the problem we're discussing... So here's a script that will do that for you.
Running this code gives you the following results:
As you can see, IDs 4, 9, 10, and 13 are now missing. However, when you have a table with hundreds of thousands of records, it is very inefficient to go and spot the gaps manually. Instead, consider using the following script:
The script is assigning the max ID value to a variable (@LastID), then creates a temp table and loads the values "1" through "@LastID" into a column in the temp table using a WHILE loop. Finally, just query the temp table for values that don't exist in your original table. The output for my example is shown below.
I hope someone finds this helpful.
This is a perfect candidate for the use of a Numbers table.
If you have the following Numbers table (i.e. all the integers from 1 to 1,000,000) in existence in your database...
;with L0(c) as (select 0 union all select 0 union all select 0) --3 Rows ,L1(c) as (select 0 from L0 a cross join L0 b cross join L0 c) --27 Rows (3x3x3) ,L2(c) as (select 0 from L1 a cross join L1 b cross join L1 c) --19683 Rows (27x27x27) ,L3(c) as (select 0 from L2 a cross join L2 b) --387,420,489 Rows (19683x19683) ,NN(n) as (select row_number() over (order by (select 0)) from L3) select n into dbo.Nums from NN where n<=1000000 create unique clustered index IX_Nums on dbo.Nums (n)
...then you could execute the following query to, for example, find the missing ProductID's in AdventureWorks' Production.Product table:
select n from Nums where n between (select min(ProductID) from Production.Product) and (select max(ProductID) from Production.Product) and not exists (select * from Production.Product where n=ProductID)
This will incur no Writes and will take a fraction of the CPU and Reads compared to the looping method.
--Brad