Analysis Services Partitions

Who is online?  0 guests and 0 members
Home  »  Articles  »  Analysis Services Partitions

Analysis Services Partitions

change text size: A A A
Published: 12/21/2009 by  DevinKnight  - Views:  [2965]  

Partitions are a great feature in Analysis Services that allow you to split measure groups in to separate physical files.  All measure groups by default have one partition but by splitting that partition you will gain improved query and processing performance. 

Partitions can be split in any way you see fit.  Many people choose to separate measure group data by date.  This makes sense because just about every fact table is going to have a date dimension associated with it.  It also makes for a clear way to draw a line in the sand where each partition can be separated.  For example, your company has three years worth of sales data so you decide to split the sales measure group into three partitions, one for each year.  Depending on the sales volume you may decide to split it even further down to each quarter or even daily. 

Multiple partitions can only be deployed to a server running SQL Server enterprise edition.  However, if you are developing on a machine that uses standard but will be deploying to enterprise then you can impersonate developing in enterprise by right-clicking on the project file in the Solution Explorer and clicking Properties.  Here you can change to the edition of SQL Server you are deploying to and this will also change the restrictions in BIDS on what you can and cannot do.

So why do you want to add partitions to your cube?  Well the leading reason is to increase performance.  So why does adding partitions actually help performance?  It’s going to help in several ways. 

Query Performance

Query performance will be increased because rather than querying an entire measure group Analysis Services can isolate a single partition to search.  It can also search multiple partitions in parallel if need be.

Processing Performance

Partitions will also increase processing performance.  Processing performance is optimized for the same reason it was while querying with partitions.  Instead of processing the entire measure group you can just process your most current data skipping over your partitions that store data that is years old and doesn’t need to be reprocessed.  Analysis Services will also allow you to process partitions in parallel.  When processing the cube you can

Recommendations

My recommendation for setting up partitions is to not let a measure group exceed 50 million rows before creating a second partition.  This is very subjective though and 50 million rows is the maximum you would ever want to see a partition.  There is nothing wrong with creating a partition before you reach this number.  Again, this is very subjective you could decide to create a new partition after one million rows.  I also generally recommend using the default MOLAP storage mode. 

Here are the very basic differences between the different storage modes:

Storage Mode

Description

MOLAP

·         Data and aggregations are stored in multi-dimensional format.  Makes for slower processing time but faster querying time. 

·         Cube must be reprocessed to get updated data.

Scheduled MOLAP

·         Same as MOLAP but cube refreshes every 24 hours

Automatic MOLAP

·         Same as MOLAP but updates in the relational database raise events that trigger cube refresh.

Medium Latency MOLAP

·         Same as MOLAP but updates in the relational database will trigger a switch to Real-time ROLAP while cube is processing.   When cube completes processing returns back to MOLAP.  Default latency is 4 hours.

Low Latency MOLAP

·         Same as Medium Latency.  Default latency is 30 minutes.

Real-time HOLAP

·         Data is stored in the relational data source but aggregations are stored in multi-dimensional format.  Fast for processing but slow querying.

Real-time ROLAP

·         Data and aggregations are stored in multi-dimensional format.  Fastest for processing but very slow querying. 

·         No data latency.

 

Example

Now that you know what partitions are and why they are helpful let’s go through the steps of splitting a measure group into two new partitions.  In this example you want to place internet sales that occur before 2003 into a partition called Internet Sales Old and everything after that date should go into a partition called Internet Sales New.

Step One – Restrict Rows to Original Partition

Remember all measure groups have at least one partition by default.  Before creating a new partition you must first change the old partition to restrict what rows are brought back.  If you don’t do this before trying to create a new partition you will get the following warning:

Click on the Source for the original partition so you can restrict the rows that it stores.

·        Change the Binding type from Table Binding to Query Binding.  You could leave this as Table Binding if you separate your measure group data into multiple tables or views. 

·         When you change this property to Query Binding you will see that it automatically provides you the query that will return back the table with a blank WHERE clause at the end.  Remember we want to have a partition with all data prior to 2003 so the WHERE clause needs to be changed to only return data prior to that date (Ex. WHERE OrderDateKey <= 20021231).  Once the query has been changed hit OK.

Step Two – Create a Second Partition

Rows are now being restricted from your first partition so you can now create a new one. 

·         Select New Partition under the measure group that is ready for a second partition and hit Next to start the Wizard.

·         Check the table(s) used for this measure group that should be used in this partition then hit Next.

·         Check the box the reads Specify a query to restrict rows and add to the WHERE clause like you did in step one to bring back all the dates after 2002 (WHERE OrderDateKey > 20021231).  Be very careful when writing these WHERE clauses.  If you accidently did >= instead on > then there would be overlapping sales data for 20021231.  I could also accidently exclude a day if on the first partition I used < and the second partition >.  This would exclude one days worth of data from my measure group.  On the bottom of the dialog box you will see a warning describing the possibility of overlap and missing days.  Hit Next when you finish typing the query.

·         You can select a storage location other then the default if you would like then hit Next.

·         On the last screen you can give the partition a name like Internet Sales Old and decide whether you want to design an aggregation now or later.  I will write a second article explaining aggregations so select Design aggregations later.  After you have named the partition hit Finish.

·         Rename the original partition Internet Sales Old

Now you can make optional change to Storage Settings that were discussed earlier in the article under the recommendations section.  You may also find an option called Enable Proactive Caching in the Storage Settings, which is an option that will also be discussed in a future article.  This is a great first step to Performance Tuning your cube.

 

kwymore likes this.
 
4.16
/5
Avg: 4.16/5: (1 votes)

Comments (6)

Egan
Egan said:
Thanks Devin...good article. One thing; Partition size is something that always been puzzling. You mention 50 million rows (I'm not sure how that relates to storage size which seems a more logical indicator). The 2005 AS Performance Guide states 2GB or 10-15 million rows. The 2008 guide mentions 200MB to 3GB storage size for non-distinct count measures. But as you say, it's subjective and this seems to be the universal consensus.
12/24/2009
 · 
 
by
DevinKnight
DevinKnight said:
Yeah I use that number as a drop dead point before creating a new partition. I don't think I've ever personally waited that long to create a new partition. Again like you noted too it is very subjective.
12/24/2009
 · 
 
by
meganathank
meganathank said:

Amazing article, Thanks for sharing.

4/13/2011
 · 
 
by
ravis
ravis said:
Can you share some example where there is a need to use partitions without worrying about date? In other words, what are some of the scenarios which lead to partitioning other than date? It would be great if you can share on this. Thanks in advance.
8/27/2011
 · 
 
by
kwymore
kwymore said:
Another instance where you might partition on a value other than date would be where you have a customer base that is split into different regions or customer types. If you know that customer base A is bigger than customer base B and C combined then you could design a partition for A and another one for all other customer types. Another scenario could be active versus inactive accounts.
8/29/2011
 · 
 
by
DevinKnight
DevinKnight said:
That's a great example!
8/31/2011
 · 
 
by

Most Recent Articles