SSIS - how to check for duplicates before insert

Who is online?  0 guests and 0 members
Home  »  Forums   »  microsoft business intelligence   »  integration services   » SSIS - how to check for duplicates before insert

SSIS - how to check for duplicates before insert

Topic RSS Feed

Posts under the topic: SSIS - how to check for duplicates before insert

Posted: 5/8/2013

Jedi Youngling 14  points  Jedi Youngling
  • Joined on: 5/8/2013
  • Posts: 7

Hello everyone, 

I would like to ask for some help in completing a simple SSIS package that will read data from an excel file, insert it into a staging table, and then perform a check for duplicates before inserting the data into the last destination sql server table.

I have completed all the steps, but I am having difficulty figuring out the best way to check for duplicates and insert the data into the last table. 

Down below is a screen shot of my data flow. If anyone could explain how I can complete the last step, possibly with detail instructions or an example, that would be awesome. 

Please let me know if I need to provide more details in order to get better help. 

Thank you very much.

 

P.S.

This is my first time creating a SSIS Package.

 

my data flow - screen shot


Posted: 5/8/2013

Jedi Youngling 96  points  Jedi Youngling
  • Joined on: 3/26/2013
  • Posts: 43

Couple of things.

First delete the sorts, they aren't doing anything for you and they slow the process down.

Second, the red line that reads 'ole db destination error output' is the error rows for the first db destination.

If you want to update two tables you will need to use a conditional split, multicast or lookup.

 

For what you are trying to do I would use a lookup. When you have a row ID that matches, you send it to an OLE DB Command and use an update statement. When you don't have a match send it to an OLE DB Destination.

Shawn

 


Posted: 5/8/2013

Jedi Youngling 14  points  Jedi Youngling
  • Joined on: 5/8/2013
  • Posts: 7

Hi Shawn. Thank you for your reply. I have been working on this a full day and I am getting to nowwhere, I do not have much experience but want to learn. Please bear with me if I don't understand fully and fast. 

I have removed the sorts, so my new DF looks like:

 

I do not want to update two tables, the staging table was just an idea to store records and then read from there so that it is easy for me to perform a check for duplicates before inserting to the final destination table.  May be there is a better way to do it via SSIS that I do not need to have a stagin table?

What should I use after inserting the rows to the stagin table? the lookup component? would it be possible that you show me a mock up DF of how things should look like? 

 

Thank you again for your reply. 


Posted: 5/8/2013

Jedi Master 3057  points  Jedi Master
  • Joined on: 2/19/2010
  • Posts: 449

You say you are checking for duplicates.  Can you please elaborate, do you want to check for duplicates in the source Excel file, or check to see if a record is a duplicate to a record already in the target table.  I would use a different process depending on where I expected a duplicate to happen.


Posted: 5/8/2013

Jedi Youngling 14  points  Jedi Youngling
  • Joined on: 5/8/2013
  • Posts: 7

Hi Daniel, thanks for your reply.

 

My basic SSIS package will upload data in an Excel format to a table in SQL Server. The problem is data in Excel may contain duplicate data already in the SQL table. Is there a way to upload the data but if it the record already exists not to import it? The columns on the destination table are:

       [Commodity Code]
,[Date]
,[Description]
,[Market Price]

There can be multiple rows with same Commodity Code, but the Date has to be different, so this two columns combined, make each row unique.

I would not mind to change the process if it helps me to get where I need to be.

Thank you again.


Posted: 5/8/2013

Jedi Youngling 14  points  Jedi Youngling
  • Joined on: 5/8/2013
  • Posts: 7

I followed the example on this post

and got this far on my DF 

my question is, the DF seems to run ok for each escenario, but no records are being written to the database table. What part am I missing, the OLE DB Destination? If that is what I am missing, where should I place that component?

 

Thank you very much for taking the time to follow this question. 


Posted: 5/8/2013

Jedi Master 3057  points  Jedi Master
  • Joined on: 2/19/2010
  • Posts: 449

Unless you are going to have multiple inputs to the Union All transforms there is no need for them. 

Looking at your work I am making the assumption that you want to update dates when appropriate, and insert new records.  Where the record exists and the date is the same there is no need to do anything. 

If this is true then there is no need at all for the "All terminate flow 1" branch.  You can delete your "All Terminate Flow 3" and replace it with a OLE DB Destination to get the data into your database.  Although it is not the most performant option for the sake of simplicity you can replace "All Terminate Flow 2" with an OLE DB Command to update your data.

 


Posted: 5/8/2013

Jedi Youngling 14  points  Jedi Youngling
  • Joined on: 5/8/2013
  • Posts: 7

Hi again Daniel, 

"Although it is not the most performant option". What would be a good performant option? it would be good for me to learn that way. Thank you for your feedback.


Posted: 5/8/2013

Jedi Youngling 14  points  Jedi Youngling
  • Joined on: 5/8/2013
  • Posts: 7

Regarding the UPDATE part of this process.

If I have a record:

ItemNumber: 10001

Date: 05/01/2013

Description: Dark Chocolate         

Price: 1.30

Then re-submit the SSIS job with a change only in price field (update):

ItemNumber: 10001

Date: 05/02/2013

Description: Dark Chocolate         

Price: 1.30

Only when the date is different and the item number is the same, then record the new record (insert)

Any examples are welcome.

Thank you.


Posted: 5/23/2013

Jedi Youngling 2  points  Jedi Youngling
  • Joined on: 5/23/2013
  • Posts: 1

Hi, dont know if you've sorted this yet, but there's a couple of ways you can do this:

1)

  • Excel Source
  • Sort with "Remove Duplicates" option selected.
  • Lookup to get all the existing info you need
  • Conditional Split - Use this for all your logic, including whether the lookup found anything previously (i.e. new lookup field is not null). It's much more flexible doing it this way.
  • OLE DB Destination for your new insert feed
  • OLE DB Command for your updates (note, this updates one row at a time)

2)

  • Excel Source
  • OLE DB Destination to Staging table
  • Remove dupes, multiple inserts/updates via a stored procedure, or if you are using SQL 2008 or later use the MERGE statement.

You can also do a mix between the two.


Page 1 of 1 (10 items)