Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

Index presentation follow up

  • 9 April 2013
  • Author: Kathi Kellenberger
  • Number of views: 7087

I presented on Indexes today for Pragmatic Works Training on the Ts. This is Back to Basics month, so my presentation was geared towards beginners. The video should be posted by the end of the week. Here are the slides and code. NOTE: You will need to also create a numbers table.

My session went about 59 minutes, so there was not enough time to answer questions, and there were a ton of questions!

Here are the questions and answers:

Q: When or why should I use clustered vs. non-clustered?
A:  A clustered index is the actual table; a non-clustered index is a separate structure that points to a table. So, you can have both a clustered index and one or more non-clustered indexes per table.

Q: What is the name the book mentioned during the session?
A:  I was talking about Jason Strate’s book Expert Performance Indexing for SQL Server 2012

Q: I don’t understand how a GUID would be wide…in relation to maybe a RowID.
A: A GUID is 16 bytes. A RowID, the location of the row is 8 bytes. A GUID is not a great key to use for the cluster key, not only because it is wide, but also the GUIDs will not be generated in order. Here is an article on this topic with even more reasons why GUIDs are not good for the cluster key or primary key.

Q: Isn’t B-Tree actually a binary tree?
A: Yes.  When taking computer science classes I always heard “binary tree”. Since working with SQL Server, I have heard the term “balanced tree” more often applied to indexes. When looking it up, I saw the term “balanced binary tree”. Balanced means that it is a binary tree, but we try to avoid skewing the structure.

Q:  I have a table with LastName and FirstName.  Sometimes I query on LastName, sometimes on FirstName, sometimes on both. How many non-clustered indexes should I have?
Q: Since order is paramount in an index and you have 3 fields, payer, sales org, plant, which will be in queries but not always payer is used; do you suggest multiple non clustered indexes to manage this occurrence?
A: That is one of those “It depends” kind of questions. You can have a separate index on each of the columns that you search on. If you want to combine columns in an index, put the most selective first.

You also have to think about other columns returned in the queries and how many rows most of the queries return. If the queries always return a small number of rows, key lookups are probably fine. If not, you may need to add included columns.

Q: Why did you make the char column 896 characters?
A: Just to demonstrate the extremes, I wanted to create a cluster key at the maximum size, which is 900 bytes. The ID column was 4 bytes, so I made the second column 896 bytes. In the WIDE demo, the index key was a combination of those two columns.

Q: An index seek is always better than an index scan, right?
A: Never say “always.” Generally, an index seek is better than an index scan, but depending on the situation, a scan may be more efficient. One example is a query where there is not a covering index. If a large number of rows is returned, it is often better to do a scan instead of a seek plus a key lookup.

Q: Why do we sometimes use multiple columns for primary keys and/or clustered index?
A: We didn’t really spend a lot of time talking about primary keys, but often there is more than one column required for identifying a unique row. For example, let’s say you have an OrderDetails table with a primary key of OrderID and LineNo. Neither column by itself is unique, but the combination is.

The primary key does not have to be the cluster key, but it often is. It is better that the cluster key is unique as well, so, if often makes sense to have a multi-column cluster key if that is the unique combination.

Q: Wouldn’t a primary key be at the leaf level?
A: Good question. I think you mean a table has a clustered index and the cluster key is different than the primary key, and you have a non-clustered index. I just tested this with SQL Server 2012 and the answer is no, the cluster key was stored at the leaf level, not the primary key. This sounds like it will be the subject of a future blog post! I didn't test any other versions.

Q: Will you describe why it is better to include a column in the index as opposed to the column being part of the index?
A: If the column will be used in searches, it is better to make the column part of the key. If it is just part of the results, you will probably want to make it an included column.

Q:  How do you find existing indexes?
A: SQL Server provides a wealth of system views and dynamic management views to help you. Here is a query you can use to get started:
SELECT object_name(object_id) AS tableName,*
FROM sys.indexes
WHERE OBJECTPROPERTY(object_id,'IsUserTable') = 1

Q: The formula for the number of tries to find a value in a binary tree is log {base 2} N
A: Thanks. It has been several years since I have thought about this. So, bottom line, the number of tries increases very slowly even when the number of values increases rapidly.

Q: If you have a datetime column, is it possible to create an index on just the date and hour parts of the date?
A: Hmmm, maybe you should create a calculated column and put an index on that.

Q: Are there some best practices on the number of indexes on a table - guided by number of columns or rows?  It seems like you could apply too many and slow things down
A: I have never heard about a guide to the number of indexes based on the size number of columns or rows. You can definitely over index a table, though. It needs to be guided by the queries that will run against the database.

Whew! Thanks for attending the session!



Categories: SQL Server
Rate this article:
No rating

Please login or register to post comments.