How to process changed records in SSAS cube.

Who is online?  0 guests and 0 members
Home  »  Forums   »  microsoft business intelligence   »  analysis services   » How to process changed records in SSAS cube.

How to process changed records in SSAS cube.

Topic RSS Feed

Posts under the topic: How to process changed records in SSAS cube.

Posted: 12/20/2010

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30

Hi All,

One of my SSAS cube was taking long time to process and performance of the cube also bad. I am trying to create partitions to improve the query performance as well as processing time of the cube.

But, my fact table is not static (not transactional), i mean data in fact table could be change any time. So for instance if i create partition on monthly basis and data would change for previous month then there is a chance of missing this change in the Cube.

Is there any option to process latest changed records in the cube without creating partitions? I dont want to use ROLAP since it may reduce the performance of the SSAS cube.

thanks in advance!!!

 

 


Posted: 12/20/2010

Jedi Master 5183  points  Jedi Master
  • Joined on: 10/27/2009
  • Posts: 294

There is no automatic way to do this. But you can do this with SSIS. When loading the Fact table you will need to detect if there are any rows that would go into another partition and then you know whether you need to process that partition or not. You can break each partition processing into multiple SSAS processing task. One for each partition. Then just call the task based on the partitions that need to be processed. You will probably need to use execute SQL tasks to check the data for each partition in the source data. Then set boolean variables to true and have an expression on the precedence constraint going to each SSAS processing tasks. If the variable is true then process that partition.

 


tags SSAS, Partition

Posted: 12/21/2010

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30

Hi Mike,

Thanks for your reply. Currently i have only 3-4 partitions and it is easy create SSIS package and process the partition accordingly. But, going forward how can i handle this process? Do i need to create 100 analysis tasks (if i have 100 partitions) and keep on change SSIS package for every new partition.

Please suggest how to handle this issue.

Thanks,

Siva.


Posted: 12/21/2010

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30
Hi Mike, Thanks for your reply. creating SSIS package and process only changed partitions is good idea. But, going forward how do we manage SSIS package: If i have 100 partitions THEN I NEED TO CREATE 100 analysis process tasks and keep on changing SSIS package for every new partition? Please suggest how to handle this issue. Thanks, Siva.

Posted: 12/21/2010

Padawan 1392  points  Padawan
  • Joined on: 1/3/2010
  • Posts: 250

Hi Siva.

Do you have a fact date right? Is there possible you to change a fact date today, but fact date is one year earlier, for example?

I can give you my example. I have my fact table daily partitioned and my cube is monthly partitioned. My cube partitions have monthly ID like 201011 and 201012, for example.

So, I identify month of my fact date changed and process their month with a script task.


Posted: 12/21/2010

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30

Thanks for your reply Marco on this.

Could you please explain in detail with the example..How do you process latest data with current partition.

I have a scenario something like this:

 

In my fact table i have two date columns and one would define transactiondate and other is dateupdated[this is house keeping column and we update this column if anything changes in my fact table].

"Transaction Date" would not change once we enter the record in fact table. But, data can be changed for this tarnsaction date and "Date Updated" column would be updated to getdate() if any changes occurs. Now, i can't create partition on top of "Transaction Date" since data is changing for previous months as well and there is a chance of missing updated data in the Cube.

 

So, is there a chance of creating partitions on top of "Date Updated" column instead "Transaction Date"?

 

Thanks,

Siva


Posted: 12/21/2010

Jedi Master 5183  points  Jedi Master
  • Joined on: 10/27/2009
  • Posts: 294

I have never tried this, but I bet your could figure out a way to get the number of partitions and use a for each loop to process the partitions.


tags SSAS, Partition

Posted: 12/21/2010

Padawan 1392  points  Padawan
  • Joined on: 1/3/2010
  • Posts: 250

So, when you run your ETL, for example today, you will get:

dateupdated=20101221 (because it is updated with getdate)

transactiondate - will be fact date.

So, if you do a distinct month of transactiondate where dateupdated=20101221, you could get for example:

201007

201012

With this, you can have a script task with a connection to your AS cube and just process this 2 months. No need to update/process any other month, right?


Posted: 12/22/2010

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30

Hi marcoadf,

 

Thanks for your reply. I think I'm very close to solve this issue. Could you please give me sample script task code that you are using so that i can test it my side. Also, How are you generating dynamic partitions by using SSIS package. I hope we can process multiple partitions at a time if transactiondate falls in 2 months instead of one by one.

 

I have one more question:

 

Can we also refresh dimensions with the changed records. Why i'am asking this question is one of my dimension is very huge and itself taking 3-4 mins to process. But, i need to process all my ETL's and cube in every 15 mins and volume of the data is huge. It's very challenging task to me to complete everything in less than 15 mins [i.e ETL loads+Cube refresh]

 

Thanks again for your help:)

 

Thanks,

Siva.

 

 


Posted: 12/22/2010

Padawan 1392  points  Padawan
  • Joined on: 1/3/2010
  • Posts: 250

Hi siva.

Use code example below:

Dim ASServer As New Microsoft.AnalysisServices.Server
ASServer.Connect("10.10.10.10")
ASServer.Databases("AdventureWorks").Cubes("AdventureWorksDW").MeasureGroups("Internect Orders").Partitions("201012").Process(Microsoft.AnalysisServices.ProcessType.ProcessFull)
ASServer.Disconnect()

I replaced my variables with values inside quotes.

For dimensions I use processupdate instead of fullprocess.

You will get those 15 min. Let me know.

 

Best Regards,

Marco Francisco.


Posted: 12/22/2010

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30

Hi Marco,

 

Thank you very much for your reply. Do we need to call this script task inside foreach loop to process partitions one by one or script task would process parallelly? Also, Could you pls guide me how to create dynamic partitions inside SSIS.

 

yeah, I'm also using ProcessUpdate option to process dimensions.

 

Thanks,

Siva.

 

 


Posted: 12/22/2010

Padawan 1392  points  Padawan
  • Joined on: 1/3/2010
  • Posts: 250

I will suggest you not to loop.

If you do so, you will wait each time to process partition. Since you want to decrease time processing, you must parallel scripts.

Have you an idea of how many partitions you may process?

To dynamic create partitions, what I do is to take one created (imagine 201001) and create exacly the same partition but with ID 201101 for example.


Posted: 12/22/2010

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30

Hi Marco,

 

Probably 2-3 partitions[Max] . I have total 4 measure groups and  may need to refresh 8-10[no. of MG* no.Of partitions] partitions at a time. So, i would create 1 container for each MG and place 2-3 script tasks parallel so that all can be refresh at a time.

 

Thanks,

Siva.


Posted: 12/22/2010

Padawan 1392  points  Padawan
  • Joined on: 1/3/2010
  • Posts: 250

yes, siva, exactly.


Posted: 12/30/2010

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30

Hi Marco,

 

I have created 3 partitions based on quarters and refreshing changed partition. But, some times changed data is falling in more than 3 partitions and henche some data is missing in the cube. Actually we can't predict the change of data, it may fall in any number of partitions.

 

So, is it possible to create 2 partitions--> 1 partition contains data always till yesterday i.e GETDATE-1 and second partition can be built on GETDATE i.e todays date and always process todays partition so that there is no chance of missing data. But, i doubt on duplicate data over here...Let say if 1 record got changed yesterday and processed that record and it is present in old partition. Again if the same record changes today we will process this record as part of todays partition. My doubt is does this record has duplicate data in cube?

 

Please could you suggest if there are any new methods to handle these issues.

 

Thanks,

Siva.


Posted: 12/30/2010

Padawan 1392  points  Padawan
  • Joined on: 1/3/2010
  • Posts: 250

Yes, it will be duplicated. When you create partitions, there is a warning tooltip to ensure that query splits data into diferent partitions.

Since you have only 3 partitions (let's say 201101, 201010 and 201007) you can make the first partition (201007) include all previous months, this is, month < 201007. This way, every month < 201007 you know that you must process this partition. Just a suggestion.

Personally, as I said, I would create montly partitions and not quarter. But it's up to you.


Posted: 1/3/2011

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30

Hi Marco,

Wish you happy new year.

 

We have decided to create partitions at yearly level since volume of the data is very less at month quarter level. Also, we identified that just need to process 2 partitions at a time (i.e current year and last year if data gets changes in last year).

 

So, i think the way of you impleted processing partitions are good enough. Could you please share me your SSIS code so that it would be very helpful.

 

Thanks again.

 

Thanks,

Siva.


Posted: 1/3/2011

Padawan 1392  points  Padawan
  • Joined on: 1/3/2010
  • Posts: 250
Answered  Answered

Happy new year for you too.

It's a solutions as any other. As long as you dont take many data for each year, you can process information in couple of minutes.

send me a private message, we talk from there.


Posted: 1/3/2011

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30

Thanks Marco... Do you have gmail access now? If yes, Please add me to your contact list.

 

My GmailID is:  bssreddy438@gmail.com. Otherwise let me know your mode of contact so that i can give a call.

 

Thanks for your help.

 

Thanks,

Siva.


Posted: 1/4/2011

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30

Hi Marco,

 

Do you have time today to discuss on SSIS package to refresh latest partition. Please let me know your avialability.

 

Thanks for your help.

 

Thanks,

Siva.


Posted: 1/4/2011

Padawan 1392  points  Padawan
  • Joined on: 1/3/2010
  • Posts: 250

hi siva. take a look at private messages at the right corner of the page. In your account.


Posted: 1/4/2011

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30
Answered  Answered

Hi Marco,

 

I couldn't see any private messages folder in my account. Could please guide me how to get it in my account.

 

Thanks,

Siva.


Posted: 1/17/2011

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30

Hi Marco,

 

I have created partitions and processing them parallelly based on data changes and its working fine on dev and UAT machines. However SSIS package is getting failed on production while processing dimensions with the following error.

 

"error: Transaction (Process ID 74) was deadlocked on thread | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.; 40001"

 

Please could you let me know about the above error if you have any idea.

 

Thanks,

Siva.


Posted: 1/17/2011

Padawan 1392  points  Padawan
  • Joined on: 1/3/2010
  • Posts: 250

I believe you have many partitions processing in parallel causing deadlock.

try this: add WITH(NOLOCK) after the table in your partitions, like this: SELECT * FROM fact_table WITH(NOLOCK) WHERE day between 20100101 and 20100101

Let me know if it's working.


Posted: 1/17/2011

Jedi Youngling 80  points  Jedi Youngling
  • Joined on: 7/23/2010
  • Posts: 30

Hi Marco,

 

Actually problems is with dimension not facts. NOLOCK  option was already applied against all facts and dimension. Package is getting failed while processing the dimension section itself. It's not entering into partitions section.

 

 

Thanks,

Siva