Posted: 9/5/2010
One of the mail downfalls of the Slowly Changing Dimension is the use of the OLEDB Command for updates. A better way to get around using the SCD is to use a lookup to see if the record is in the table. From there use a Conditional Split to see if any other columns are different from what you are bringing in. If there are no changes, then simply insert the records. If there are changes write the records to a stage table and then do an Execute SQL Command to process the updates. The difference is the OLEDB Destination is going to be a row by row command where as the Execute SQL task will do it in batches making the process much quicker. Just do a google search for incremental load in SSIS and you should get plenty of results on how to do this.
As far as soft and hard deletes go there are benefits to both. The basic difference is a hard delete will physically remove the record from the table, where as a soft delete will likely set an ActiveFlag or DeleteFlag to indicate if the record is "deleted" or not. Depending on the amount of data being deleted not physically deleting data could hard performance as you would be required to bring all those records back or filter when out of your result set with a WHERE DeleteFlag = 0.
Bottlenecks in SSIS are dependent on the transform being used really. We would need to know a little more specifically what you are doing to be able to give guidance. Just a for instance for you, the Sort Transform is REALLY bad for performance because it requires all the data to be there so it can be sorted before moving on. This would be when you want to replace that with an ORDER BY in your query if at all possible. Certain transforms come at a very high cost to performance.
Posted: 10/8/2010
I have used both SCD as well as kimble; if you volume od data is low then I preffer to use SCD as itf fairly simple to but but have problem with performance when used with large volume of data; however Kimble use it very efficently by doind in memmory comparision so it optimize performance by many fold but drawback is it used more memory and CPU to process.
both doent satisfy my requirement of performance and both methods have some limitatin when you have any database changes; Lookup also works fine but again it also have a performance issues if you have very large volumen of data.
I have developed my own way to implement type 2 dimension; it relies on source system ; if you have any field in source table which let you know if its modified or changes since last load like ModifiedDate or rowversion or timestamp etc ; then you can use that. here are steps to implement
1) select all records from source systm what are changed since last load
2) insert all rcords in your dimesion table
3) expire record using a SQL query
update enddate where Startdate < [todate's date] and [business key] in ( select [business key] from dimension where startdate = [todate's date])
I know this is not a true type-2 implementation but it solve my requirement as it decrease package execution time from 7 hours to just 15 minutes.
Performance tuning :
I have comeup with following points for performance tunning
1) Optimize source connection ; most of the time packages use external data source; so source db connection need to optimize by using diffrent network optimization, changing network package size to optimal level from 4k to 32 k; I have experincs almost 50% improvment in performance with network packet size = 32k when data is fetched remotely but cause issues when used locally and put more stress on database.
2) Optimize you load query from source system, I have seen number of times that people do not care about this much does not write query optimaly. if source system is good and you ahve sufficent allocated process on system or running load process in off-time then do some preprocessing in query itself.
3) use lookups very efficently; instead of select * use only desired columns and eliminate duplicates in lookup; (I know SSIS do optimize it but this is a good practice).
4) Slowest componet in SSIS is Script component after SCD; avoid using it if possible. or use drived column instead if possible.
5) perform parallel processing
Posted: 1/18/2011
vrkn82 said: Hello , I came across these questions and I am new to sql can any one please tell me about this. Different ways to implement SCD other than the wizard, using logicIssues faced while implementing SCDGiven a package how would you begin tuning it for better performanceexpalin about bottleneck issue in SSIS packages, how do you resolve itHow do you use control tables for incremental loadSoft/hard deleteWays to create a dimension table and what are the main properties
Hello , I came across these questions and I am new to sql can any one please tell me about this.
The SCD performs poorly for slightly bigger dimensions, because it doesn't use cache.Here are some alternatives for the SCD.