Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

Hello Operator: Key Lookup

  • 11 June 2013
  • Author: Kathi Kellenberger
  • Number of views: 6775
  • 0 Comments

The T-SQL Tuesday topic this month, “Hello Operator”, is about execution plan operators. I would like to talk about the Key Lookup Operator.

So, the first thing you may be wondering about is if it is good or bad to see the Key Lookup operator in your plan. Well, that is one of those “it depends” kind of questions. Sometimes it is OK, and sometimes it is very, very bad.

Let’s take a look at why the Key Lookup operator would be used. Say you have a query with a WHERE clause.  In this case there is a nonclustered index the optimizer determines can be used to locate the rows. One of the benefits of nonclustered indexes is that, if all the columns required for the query results are part of the nonclustered index, the underlying table does not have to be accessed.  This is also called a "covering" index.

I’ll steal a great analogy (or is it metaphor?) from my friend, Jason Strate. Travel back in time when the public library catalog was paper based. To find a book, you looked at cards in cabinets. If you wanted to find the names of all the books written by Stephen King, you could get that information from the catalog. If you wanted to actually read one of those books, you would have to walk to the appropriate shelf to get the book.

SQL Server works the same way. If all the information required for the query is contained within the nonclustered index, it is not necessary to access the actual table. If some of the required columns are not found in the nonclustered index, SQL Server will have to pull data from some of the pages of the table. To see this in the execution plan, look for the Key Lookup operators.  If the nonclustered index is very selective and there are just a handful of rows returned, this is not so bad. But if the index is not selective or there are hundreds of thousands of rows, then this is bad. Sometimes it would be more efficient to just scan the clustered index or heap instead of using the index.

Here is an example using the AdventureWorks database. The Sales.SalesOrderDetail table has a nonclustered index on ProductID, but the index doesn’t contain the OrderQty column.

SELECT OrderQty, ProductID
FROM Sales.SalesOrderDetail
WHERE ProductID = 897;



This query returns two rows, but still 98% of the effort is expended by the Key Lookup. If I change to a ProductID that returns many more rows, the optimizer might do a  clustered index scan instead along with an index suggestion.

SELECT OrderQty, ProductID
FROM Sales.SalesOrderDetail
WHERE ProductID = 707;

The two queries are ad-hoc queries. If the queries were instead parameterized, such as within a stored procedure, we now have a problem called “parameter sniffing”. The optimizer will store the plan for the first time the proc is called and use that plan even if it would be better to use a different plan. The next example demonstrates this.

 --create the proc
CREATE PROC usp_GetQty @ProductID INT AS
 SELECT OrderQty
 FROM Sales.SalesOrderDetail
 WHERE ProductID = @ProductID;
GO
--Call with 897
EXEC usp_GetQty @ProductID = 897;

The call to the stored procedure uses the Key Lookup. That is what we expect since the Key Lookup is efficient when only two rows are returned. Now call the proc with 707.

EXEC usp_GetQty @ProductID = 707;
 
In this case, the optimizer reuses the original plan, which is not the best option for a ProductID of 707. Using the SET STATISTICS IO option, we can compare the IO required to return the rows with the two different methods.

SET STATISTICS IO ON;
GO
SELECT OrderQty, ProductID
FROM Sales.SalesOrderDetail
WHERE ProductID = 707;
GO
EXEC usp_GetQty @ProductID = 707;

How can we solve this problem? There are a few ways to solve it involving query hints, but in this case, I would just add the OrderQty column to the ProductID index as an included column, especially if this query runs frequently. This will eliminate the need to get the OrderQty values from the table. The index suggested  with the clustered index scan should not be used. This suggestion will actually create a duplicate index, but that is a story for another day...

NOTE: There is also an operator call RID Lookup that is used in the same way with HEAPS.

 

 

 

 

Print
Categories: SQL Server
Tags:
Rate this article:
No rating

Please login or register to post comments.