One of my SSAS cube was taking long time to process and performance of the cube also bad. I am trying to create partitions to improve the query performance as well as processing time of the cube.
But, my fact table is not static (not transactional), i mean data in fact table could be change any time. So for instance if i create partition on monthly basis and data would change for previous month then there is a chance of missing this change in the Cube.
Is there any option to process latest changed records in the cube without creating partitions? I dont want to use ROLAP since it may reduce the performance of the SSAS cube.
thanks in advance!!!
There is no automatic way to do this. But you can do this with SSIS. When loading the Fact table you will need to detect if there are any rows that would go into another partition and then you know whether you need to process that partition or not. You can break each partition processing into multiple SSAS processing task. One for each partition. Then just call the task based on the partitions that need to be processed. You will probably need to use execute SQL tasks to check the data for each partition in the source data. Then set boolean variables to true and have an expression on the precedence constraint going to each SSAS processing tasks. If the variable is true then process that partition.
Thanks for your reply. Currently i have only 3-4 partitions and it is easy create SSIS package and process the partition accordingly. But, going forward how can i handle this process? Do i need to create 100 analysis tasks (if i have 100 partitions) and keep on change SSIS package for every new partition.
Please suggest how to handle this issue.
Do you have a fact date right? Is there possible you to change a fact date today, but fact date is one year earlier, for example?
I can give you my example. I have my fact table daily partitioned and my cube is monthly partitioned. My cube partitions have monthly ID like 201011 and 201012, for example.
So, I identify month of my fact date changed and process their month with a script task.
Thanks for your reply Marco on this.
Could you please explain in detail with the example..How do you process latest data with current partition.
I have a scenario something like this:
In my fact table i have two date columns and one would define transactiondate and other is dateupdated[this is house keeping column and we update this column if anything changes in my fact table].
"Transaction Date" would not change once we enter the record in fact table. But, data can be changed for this tarnsaction date and "Date Updated" column would be updated to getdate() if any changes occurs. Now, i can't create partition on top of "Transaction Date" since data is changing for previous months as well and there is a chance of missing updated data in the Cube.
So, is there a chance of creating partitions on top of "Date Updated" column instead "Transaction Date"?
I have never tried this, but I bet your could figure out a way to get the number of partitions and use a for each loop to process the partitions.
So, when you run your ETL, for example today, you will get:
dateupdated=20101221 (because it is updated with getdate)
transactiondate - will be fact date.
So, if you do a distinct month of transactiondate where dateupdated=20101221, you could get for example:
With this, you can have a script task with a connection to your AS cube and just process this 2 months. No need to update/process any other month, right?
Thanks for your reply. I think I'm very close to solve this issue. Could you please give me sample script task code that you are using so that i can test it my side. Also, How are you generating dynamic partitions by using SSIS package. I hope we can process multiple partitions at a time if transactiondate falls in 2 months instead of one by one.
I have one more question:
Can we also refresh dimensions with the changed records. Why i'am asking this question is one of my dimension is very huge and itself taking 3-4 mins to process. But, i need to process all my ETL's and cube in every 15 mins and volume of the data is huge. It's very challenging task to me to complete everything in less than 15 mins [i.e ETL loads+Cube refresh]
Thanks again for your help:)
Use code example below:
Dim ASServer As New Microsoft.AnalysisServices.Server
I replaced my variables with values inside quotes.
For dimensions I use processupdate instead of fullprocess.
You will get those 15 min. Let me know.
Thank you very much for your reply. Do we need to call this script task inside foreach loop to process partitions one by one or script task would process parallelly? Also, Could you pls guide me how to create dynamic partitions inside SSIS.
yeah, I'm also using ProcessUpdate option to process dimensions.
I will suggest you not to loop.
If you do so, you will wait each time to process partition. Since you want to decrease time processing, you must parallel scripts.
Have you an idea of how many partitions you may process?
To dynamic create partitions, what I do is to take one created (imagine 201001) and create exacly the same partition but with ID 201101 for example.
Probably 2-3 partitions[Max] . I have total 4 measure groups and may need to refresh 8-10[no. of MG* no.Of partitions] partitions at a time. So, i would create 1 container for each MG and place 2-3 script tasks parallel so that all can be refresh at a time.
yes, siva, exactly.
I have created 3 partitions based on quarters and refreshing changed partition. But, some times changed data is falling in more than 3 partitions and henche some data is missing in the cube. Actually we can't predict the change of data, it may fall in any number of partitions.
So, is it possible to create 2 partitions--> 1 partition contains data always till yesterday i.e GETDATE-1 and second partition can be built on GETDATE i.e todays date and always process todays partition so that there is no chance of missing data. But, i doubt on duplicate data over here...Let say if 1 record got changed yesterday and processed that record and it is present in old partition. Again if the same record changes today we will process this record as part of todays partition. My doubt is does this record has duplicate data in cube?
Please could you suggest if there are any new methods to handle these issues.
Yes, it will be duplicated. When you create partitions, there is a warning tooltip to ensure that query splits data into diferent partitions.
Since you have only 3 partitions (let's say 201101, 201010 and 201007) you can make the first partition (201007) include all previous months, this is, month < 201007. This way, every month < 201007 you know that you must process this partition. Just a suggestion.
Personally, as I said, I would create montly partitions and not quarter. But it's up to you.
Wish you happy new year.
We have decided to create partitions at yearly level since volume of the data is very less at month quarter level. Also, we identified that just need to process 2 partitions at a time (i.e current year and last year if data gets changes in last year).
So, i think the way of you impleted processing partitions are good enough. Could you please share me your SSIS code so that it would be very helpful.
Happy new year for you too.
It's a solutions as any other. As long as you dont take many data for each year, you can process information in couple of minutes.
send me a private message, we talk from there.
Thanks Marco... Do you have gmail access now? If yes, Please add me to your contact list.
My GmailID is: email@example.com. Otherwise let me know your mode of contact so that i can give a call.
Thanks for your help.
Do you have time today to discuss on SSIS package to refresh latest partition. Please let me know your avialability.
hi siva. take a look at private messages at the right corner of the page. In your account.
I couldn't see any private messages folder in my account. Could please guide me how to get it in my account.
I have created partitions and processing them parallelly based on data changes and its working fine on dev and UAT machines. However SSIS package is getting failed on production while processing dimensions with the following error.
"error: Transaction (Process ID 74) was deadlocked on thread | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.; 40001"
Please could you let me know about the above error if you have any idea.
I believe you have many partitions processing in parallel causing deadlock.
try this: add WITH(NOLOCK) after the table in your partitions, like this: SELECT * FROM fact_table WITH(NOLOCK) WHERE day between 20100101 and 20100101
Let me know if it's working.
Actually problems is with dimension not facts. NOLOCK option was already applied against all facts and dimension. Package is getting failed while processing the dimension section itself. It's not entering into partitions section.