Last week I wrote a set of Advanced SSIS Interview Questions. Here are the answers I came up for these. Of course you may have some variation of what I have and I would love to hear how you would answer these. Feel free to email me how you would answer them.
I’ll skip the demonstrate/whiteboard part of the questions.
1. Demonstrate or whiteboard how you would suggest using configuration files in packages. Would you consider it a best practice to create a configuration file for each connection manager or one for the entire package?
There should be a single configuration file for each connection manager in your packages that stores their connection string information. So if you have 6 connection managers then you have 6 config files. You can use the same config file across all your packages that use the same connections.
If you have a single config file that stores all your connection managers then all your packages must have contain the connection managers that are stored in that config file. This means you may have to put connection managers in your package that you don’t even need.
2. Demonstrate or whiteboard how checkpoints work in a package.
When checkpoints are enabled on a package if the package fails it will save the point at which the package fails. This way you can correct the problem then rerun from the point that it failed instead of rerunning the entire package. The obvious benefit to this is if you load a million record file just before the package fails you don’t have to load it again.
3. Demonstrate or whiteboard using a loop in a package so each file in a directory with the .txt extension is loaded into a table. Before demonstrating this tell which task/container accomplishes this and which enumerator will be used. (Big hint on which task/container to use is that it requires and enumerator)
This would require a Foreach Loop using the Foreach File Enumerator. Inside the Foreach Loop Editor you need to set a variable to store the directory of the files that will be looped through. Next select the connection manager used to load the files and add an expression to the connection string property that uses the variable created in the Foreach Loop.
4. Demonstrate or whiteboard how transactions work in a package.
If transactions are enabled on your package and tasks then when the package fails it will rollback everything that occurred during the package. First make sure MSDTC (Microsoft Distributed Transaction Coordinator) is enabled in the Control Panel -> Administrative Tools -> Component Services. Transactions must be enabled not only on the package level but also on each task you want included as part of the transaction. To have the entire package in a transaction set TransactionOption at the package level to Required and each task to Supported.
5. If you have a package that runs fine in Business Intelligence Development Studio (BIDS) but fails when running from a SQL Agent Job what would be your first guess on what the problem is?
The account that runs SQL Agent Jobs likely doesn’t have the needed permissions for one of the connections in your package. Either elevate the account permissions or create a proxy account.
To create a proxy account you need to first create new credentials with the appropriate permissions. Next assign those credentials to a proxy account. When you run the job now you will select Run As the newly created proxy account.
6. What techniques would you consider to add auditing to your packages? You’re required to log when a package fails and how many rows were extracted and loaded in your sources and destinations.
I like to create a database that is designated for package auditing. Track row counts coming from a source and which actually make it to a destination. Row counts and package execution should be all in one location and then optionally report off that database.
There are also third party tools that can accomplish this for you (Pragmatic Works BI xPress).
7. What techniques would you consider to add notification to your packages? You’re required to send emails to essential staff members immediately after a package fails.
This could either be set in the SQL Agent when the package runs or actually inside the package you could add a Send Mail Task in the Event Handlers to notify when a package fails.
8. Demonstrate or whiteboard techniques you would use to for CDC (Change Data Capture)? Tell how you would write a package that loads data but first detects if the data already exists, exists but has changes, or is brand new data for a destination.
If for some reason you’ve avoided using a whiteboard to show your ideas to this point then make sure you start on this question! For small amounts of data I may use the Slowly Changing Dimension.
More often than not the data is too large to use in such a slow transform. I prefer to do a lookup on the key of the target table and rows that don’t match are obviously new rows that can be inserted. If they do match it’s possible they are updates or duplicates. Determine this by using a conditional split comparing rows from the target to incoming rows. Send updates to a staging table that can then be updated in an Execute SQL Task.
Explain that putting updates in a staging table instead of updating using the OLE DB Command is much better for performance because the Execute SQL Task performs a bulk operation.
9. Explain what breakpoints are and how you would use them.
Breakpoints put pauses in your package. It’s a great tool for debugging a package because you can place a breakpoint on a task and it will pause the package based on execution events.
A reason in which I have used breakpoints is when I have a looping container and I want to see how my variables are changed by the loop. I would place a watch window on the package and type the variable name in. Set a break point on the container the stop after each iteration of the loop.
very nice qa....
bharath reddy vasireddy
Very useful post.
Thanks Devin. Please put more.
Just to say some thing regarding question "Would you consider it a best practice to create a configuration file for each connection manager or one for the entire package? "
I will say, we should use only one file, because it will be easy to manage. Secondly, if there are variables which are not common in two packages, you can still put all the variables in one file.
So file will contain Common Variables of 1st and 2nd Package + Uncommon Variables of 1st Package + Uncommon Variables of 2nd Package. And it works
regarding "how checkpoints work in a package."
We shouldnt have two or more seperate source to target mapping in one data flow task. Otherwise whole idea of check points will useless.