Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

«November 2015»

DirectQuery in Power BI Desktop

In the latest Power BI Desktop a new Preview features was released that now allows you to connect using DirectQuery to either SQL Server or Azure SQL Databases.  DirectQuery is a really neat feature that allows you to point to the live version of the data source rather than importing the data into a data model in Power BI Desktop. 

Normally when you want to get an updated dataset in the Power BI Desktop you would have to manually click the refresh button (this can be automated in the Power BI Service), which would initiate a full reimport of your data.  This refresh could take a variable amount of time depending on how much data your have.  For instance, if you’re refreshing a very large table you may be waiting quite a while to see the newly added data. 

With DirectQuery data imports are not required because you’re always looking at a live version of the data.  Let me show you how it works!

Turning on the DirectQuery Preview

Now, because DirectQuery is still in Preview you must first activate the feature by navigating to File->Options and settings->Options->Preview Features then check DirectQuery for SQL Server and Azure SQL Database


Once you click OK you may be prompted to restart the Power BI Desktop to utilize the feature.

Using DirectQuery in Power BI Desktop

Next make a connection either to an On-Premises SQL Server or Azure SQL database.

Go to the Home ribbon and select Get Data then SQL Server.


Provide your Server and Database names then click OK. ***Do not use a SQL statement.  It is not currently supported with DirectQuery***


From the Navigator pane choose the table(s) you would like to use.  I’m just going to pick the DimProduct table for this example and then click Load.  You could select Edit and that would launch the Query Editor where you could manipulate the extract.  This would allow you to add any business rules needed to the data before visualizing it.


Next you will be prompted to select what you want to connect to the data. Again, Import means the data

Read more

The Big Data Blog Series

Over the last few years I’ve been speaking a lot on the subject of Big Data. I started by giving an intermediate session called “Show Me Whatcha’ Workin’ With”. This session was designed for people who had attended a one hour introductory session that showed you how to load data, to look at possible applications … Continue reading The Big Data Blog Series
Read more

Hello Operator: Key Lookup

  • 11 June 2013
  • Author: Kathi Kellenberger
  • Number of views: 5387

The T-SQL Tuesday topic this month, “Hello Operator”, is about execution plan operators. I would like to talk about the Key Lookup Operator.

So, the first thing you may be wondering about is if it is good or bad to see the Key Lookup operator in your plan. Well, that is one of those “it depends” kind of questions. Sometimes it is OK, and sometimes it is very, very bad.

Let’s take a look at why the Key Lookup operator would be used. Say you have a query with a WHERE clause.  In this case there is a nonclustered index the optimizer determines can be used to locate the rows. One of the benefits of nonclustered indexes is that, if all the columns required for the query results are part of the nonclustered index, the underlying table does not have to be accessed.  This is also called a "covering" index.

I’ll steal a great analogy (or is it metaphor?) from my friend, Jason Strate. Travel back in time when the public library catalog was paper based. To find a book, you looked at cards in cabinets. If you wanted to find the names of all the books written by Stephen King, you could get that information from the catalog. If you wanted to actually read one of those books, you would have to walk to the appropriate shelf to get the book.

SQL Server works the same way. If all the information required for the query is contained within the nonclustered index, it is not necessary to access the actual table. If some of the required columns are not found in the nonclustered index, SQL Server will have to pull data from some of the pages of the table. To see this in the execution plan, look for the Key Lookup operators.  If the nonclustered index is very selective and there are just a handful of rows returned, this is not so bad. But if the index is not selective or there are hundreds of thousands of rows, then this is bad. Sometimes it would be more efficient to just scan the clustered index or heap instead of using the index.

Here is an example using the AdventureWorks database. The Sales.SalesOrderDetail table has a nonclustered index on ProductID, but the index doesn’t contain the OrderQty column.

SELECT OrderQty, ProductID
FROM Sales.SalesOrderDetail
WHERE ProductID = 897;

This query returns two rows, but still 98% of the effort is expended by the Key Lookup. If I change to a ProductID that returns many more rows, the optimizer might do a  clustered index scan instead along with an index suggestion.

SELECT OrderQty, ProductID
FROM Sales.SalesOrderDetail
WHERE ProductID = 707;

The two queries are ad-hoc queries. If the queries were instead parameterized, such as within a stored procedure, we now have a problem called “parameter sniffing”. The optimizer will store the plan for the first time the proc is called and use that plan even if it would be better to use a different plan. The next example demonstrates this.

 --create the proc
CREATE PROC usp_GetQty @ProductID INT AS
 SELECT OrderQty
 FROM Sales.SalesOrderDetail
 WHERE ProductID = @ProductID;
--Call with 897
EXEC usp_GetQty @ProductID = 897;

The call to the stored procedure uses the Key Lookup. That is what we expect since the Key Lookup is efficient when only two rows are returned. Now call the proc with 707.

EXEC usp_GetQty @ProductID = 707;
In this case, the optimizer reuses the original plan, which is not the best option for a ProductID of 707. Using the SET STATISTICS IO option, we can compare the IO required to return the rows with the two different methods.

SELECT OrderQty, ProductID
FROM Sales.SalesOrderDetail
WHERE ProductID = 707;
EXEC usp_GetQty @ProductID = 707;

How can we solve this problem? There are a few ways to solve it involving query hints, but in this case, I would just add the OrderQty column to the ProductID index as an included column, especially if this query runs frequently. This will eliminate the need to get the OrderQty values from the table. The index suggested  with the clustered index scan should not be used. This suggestion will actually create a duplicate index, but that is a story for another day...

NOTE: There is also an operator call RID Lookup that is used in the same way with HEAPS.





Categories: SQL Server
Rate this article:
No rating

Please login or register to post comments.