Posted: 4/22/2011
Hi,
We are in a process of building a datawarehouse for one of our projects. We have about 20 tables where bulk inserts take place during ETL process. We have planned to design the architecture in such a way that, to remove clustered and non-clustered indexes for all these 20 tables during ETL and then place them back on the tables when the ETL load gets completed. So, my question, is this a good practice for such a situation or please suggest any better option if any. Thank you.
Shiv
I don't know if this is considered a good practice, but it is certainly an interesting thought.
Assuming you are talking about your destination tables only, this should decrease your direct load times. Although I am not sure the net effect after the index is rebuilt is going to save much. I would say test, test, test. I would be interested in hearing about your results.
Also, any reports or other processes that depend on those tables will be substantially impaired until the index is rebuilt, so that should be built into your thinking as well.
Yes, actually the main purpose of re-builting the indexes is , there are substantial reports which depend on these tables.
Posted: 4/27/2011
I had a chance to test this yesterday and it was amazingly successful. I have had an SSIS package that has been extremely slow. I finally got a chance to do some troubleshooting of the package speed last night. My package was running a minimum of 5 hours once a week. I had a chance to run some traces on it and it turned out to be a problem with a destination table. A couple of quick articles on Google and I came to believe that removing one of the indexes on the destination table was the answer. With the package at just under 10% complete after the first hour, I disabled the index on the destination table and the job finished in the next 10 minutes!
Although I don't believe that my results are typical and that more testing is warranted, this unscientific test definitely indicates that this is a valid path to test and pursue. After the package was finished it took 25 to 30 minutes for the index to rebuild. Overall I got more than a 50% performance improvement in total execution time for the package when I disabled the index, did the load and then rebuilt the index.
I hope this helps. If you find this answer useful, please mark the question as answered.
Posted: 5/8/2011
Good to know this!! Thank you for testing this out Daniel. I am sure it is worth the test :)
Posted: 7/13/2011
Daniel said: Although I don't believe that my results are typical and that more testing is warranted
Although I don't believe that my results are typical and that more testing is warranted
Interesting to find a similar thought and I should have seen this thread atleast few days back. We tried to disable the index and enable it post ETL and it really showed significant improvement in the load time. Yet to quantify on the % of improvement.
Just out of curiousity, are you disabling a clustered index or a non-clustered index? For me it was a non-clustered index.
Thanks,
Sorry for jumping on this so late, but my experiences might be of some value...
I am usually reluctant to disable (or even delete and recreate) indexes on the destination tables during ETL. The main reason for this is that I prefer to use stored procs for all my ETL processes (probably another good debate for another time), and the very same indexes that slow down inserts/updates speed up the rest of the process (for instance identifying type 1 or type 2 changes).
All of this is of course dependant on the type of process, and it might make sense to do this for very specific indexes which are only used for reporting purposes. I guess the risk here is that indexes change over time as the structure of the tables, reports or adhoc requirements change...and that would require a change in the ETL process.
With all of the above said, I have found it very useful to delete and recreate indexes on all my staging tables. Transferring a lot of data from the source systems are so much faster without any indexes on the staging tables, and I was able to develop a generic process to remove any existing indexes in my staging database (saving the index information in a table so that it isn't hard-coded), load the data and then recreate the very same indexes. This process has proven to be very effective in my projects.
Come to think of it, it would be possible to identify the indexes you would want to recreate on the destination tables by some kind of naming convention (like a prefix), and then develop a similar generic process to only recreate those indexes. That would be pretty cool actually...but personally I haven't reached a stage where something like that is necessary.
Martin.