Ok, you've clicked on the run button twenty minutes ago and the query is still running. You really don''t want to stop and do serious performance tuning but you're getting that sinking feeling that you may celebrate your next birthday before this query comes back. You start to sweat, your mind is racing and although you are not a woman in labor you start to practice the lamas breathing techniques you once learned. Then you remember the article you saw about SQL Performance Tuning for programmers who don't like to do performance tuning and you smile.
Ok. Confession, I really don't like performance tuning. I just like fast queries. However, I recently had a project with a number of performance challenges and I learned some important things that I did not want to forget so I wrote them down and I would like to share them with you so you may benefit from my efforts.
- Check the number of rows on each table involved in the query. Review each
join – what columns are the tables joined on. Check the where clause – what columns are filtered on. Review the table definitions to see what indexes are available to support the joins and the where clause. Check the execution plan (are the indexes available being used optimally? Do indexes exist for each join column and
column in the where clause? For very small tables, indexes may not be critical but for large tables, if an index
does not exist or is not being used by SQL Server, then the query performance can probably be improved. In short, you need to have the index and you need to make sure SQL Server uses it.
SQL tuning is more an art than a science and requires a lot of trial and error.
- When you get a really challenging query to tune, often the things that improve performance will not seem to make sense (at least until you can think of a really smart sounding reason later).
- When testing changes, make sure you have identical environments in which to run each query version. Test only one change at a time.
Using the SQL Query Analyzer
- Highlight a query, right mouse click and select ‘Display Estimated Execution Plan’ is a good way to get an idea of what SQL Server will do on the query but it is not always reliable. Sometimes it is better to use the command SET STATISTICS IO ON before the query and then run the query. This will cause messages to be output that show you what SQL Server is actually doing.
SET STATISTICS TIME ON will print timing information.
- Tuning is an area where different people bring different ideas,
experience and knowledge to the table so make use of this. It is also an opportunity for DBEs to
Test In Batches
- If the query takes a long time to run, you need to run just a sample size large enough to test a
change but small enough to allow testing of many different iterations. If you are batching it, then each loop
iteration will give you an idea of performance. Adding some print statements can help track the progress. In lieu of a batch, you could use the SELECT TOP X. Note: The first batch is almost always significantly slower than the rest of them, and almost always inconsistent when running the query repetitively during testing. Do not
rely on the first batch for anything. Run a minimum of 3-4 when possible.
And from the SQL Server Help, ‘Use DBCC DROPCLEANBUFFERS to test queries with a cold buffer cache without shutting downand restarting the server.’
Make sure statistics are up to date.
- You can use the command UPDATE STATISTICS TableName to get updated stats
on a given table or view.
Clustered indexes are NOT always better
- For joins, clustered indexes seem to be the best but for filters, not necessarily. For example, I had an input table that had a clustered index on UserID which it was using to join to another table. This was excellent for the join. I decided to use the UserID for the batch filter (i.e. on a range of UserIDs and increment the batch by 1 million). This did not perform optimally. By adding an identity column with a non clustered index on it to the table I was using to drive the batch loop and using that in the filter range, i.e. where ID between @StartID and @EndID, I got much better performance. The reason; UserIDs have many gaps so many of the filter iterations were not getting many hits and the cost of executing the query loop extra times outweighed the benefit of using a clustered index.
Brute force can be a good option
- For a really long running query, we found that just splitting it up into two queries, each with a different key range, allowed us to get almost double the speed. We were taking advantage of multiple processors. This can get
diminishing returns and add complexity but is a good method to keep in mind. Along the same lines, more
hardware also helps and this can include running queries on different servers (with refreshed synchronized data). Ok, not the slickest way to save the day but heck, it works. Note: Assuming more hardware is available.
Verify your performance changes do NOT affect the output
- It is easy to get so excited by a significant performance improvement that you can’t wait to tell the world but hold on! Check the results and make sure its correct. If you have a baseline query that was correct but performing slowly, you can just run both the baseline and the revised query on the same environment and compare the results.
When all else fails, consider using hints
- SQL Server’s optimizer is very smart and if the statistics are up to date will usually pick the best plan for most scenarios. However, there can be exceptions to this. The optimizer picks the best plan for the most common queries it expects based on the statistics. For example, 95% of the companies on the company table only have 1 matching row on the CompanyAttrbutes table and the query performs very well for those. However, the other 5% have over 10,000 rows
in the CompanyAttributes table and for those companies, the query is very slow. In such cases, a hint (OPTION (OPTIMIZE FOR (@startid = 12131111,@endid=13433333)) where (@startid is the range start and @endid is the range end) can override the execution plan to use an index that would help on that 5%. You must be careful not to slow down the other 95% so two queries; one with the hint and one without might be
needed. Recently, on an extract, SQL Server would not use a recently added index and was running slow on a
query. Adding the ForceSeek hint on a join greatly improved performance. Adding the LOOP hint to a join can also really boost performance. Note: This was a one time extract query but it needed to run in a small time window. If this had been a query in a frequently called stored procedure, using hints can be risky because hints take away SQL Server's flexibility to optimize queries and a hint that helped at one time may hinder performance at another time. Hints should be used as a last resort and with CAUTION.
An indexed view may be the answer
If you have a slow running query that involves a join of several tables
- If you have a slow running query that involves a join of several tables with millions of records, try to join 2 or 3 large tables and put the results into a temp table, then use the temp table to join to the rest of the tables.
- Quick way to compare a revised query to the original. With Query Execution plan, paste both the original query and modified query into one query window and select it, right mouse click and select ‘Display Estimated Execution Plan’. A total cost % is given for each query and the one with the lower % is the better performer.
- If CTEs are used in slowing running query, try to replace the CTE with temp table (add index if needed) to check the performance difference.
These are just some things I learned on this project and I'm sure I have a lot more to learn. If you have more suggestions, please add your own article.