Parent-Child Dimension in Analysis Services

Who is online?  0 guests and 0 members
Home  »  Articles  »  Parent-Child Dimension in Analysis Services

Parent-Child Dimension in Analysis Services

change text size: A A A
Published: 6/7/2010 by  DevinKnight  - Views:  [4142]  

A common practice with dimension tables storing Accounting and Employee data is to implement a parent-child relationship.  This can be used to store hierarchies or levels of your data between just a couple columns.  For example, if you want to easily see who the direct manager is for each of your employees you may find the parent-child, also known as a self-referencing dimension, the best way to go.  The table is simply foreign keyed back to itself, hence it being called self-referencing.  Each employee has a primary key value but only those employees who have a direct manager will have a foreign key.  So the President of the company who has no boss then you will have a null value for the foreign key.  The example below will help you visualize the Parent-Child dimension.

EmployeeKey

EmployeeName

ParentEmployeeKey

1

Terry Boss

Null

2

Natalie Gomez

1

3

Sam Copper

4

4

Marc Rumsey

1

5

Julia Ventura

7

6

Rick Henderson

4

7

Kimberly Lucas

1

8

Eric Jackson

2

Analysis Services is able to use these table parent-child dimensions and automate the hierarchy building that you would typically do yourself.  When initially building the dimension Analysis Services will generally automatically detect that it is a parent-child dimension purely based on the self-referencing key.  When it finds the self-referencing key it will change the Usage property of the foreign key column to Parent, indicating it is part of the parent-child dimension.  If it does not auto-detect this for some reason you can still manually make the change in the dimension designer, shown below.

As I mentioned before typically in Analysis Services you would create hierarchies yourself in the Hierarchies pane of the dimension designer.  However, using the parent-child dimension does all that work for you.  When browsing the dimension you will see several layers of the dimension created called Level 01, 02, 03, etc…  These level names can be manipulated to suit the data being displayed.  You’ll also what looks like duplicate attributes when you get to Level 05.  These are actual sales that manager recorded but we can hide these values if we want.

To change the level names to better represent the data you can change the NamingTemplate property on the parent key attribute in the dimension designer.  By changing Level 2 to something like Employee Level *, as shown below, each level of the parent hierarchy will show Employee Level 02, Employee Level 03, etc...

We can also choose to hide our manager name from the lowest level by changing the MembersWithData property on the parent key attribute.  This will remove the manager’s name while still keeping their sales so we can still accurately see total sales overall.  By changing the property to NonLeafDataHidden you will get the desired results when browsing.  Viewing the results after making these changes will look something like the below image.

Notice all the totals are still the same but the Manager has been remove from Level 5.  Also, you’ll see each level of the hierarchy named Employee Level.

So now that you know how to create this working parent-child dimension in Analysis Services I’m going to give you a warning about using them that probably warrants being at the top of this article.  I highly discourage using parent-child dimensions is larger record sets.  You may be surprised what I consider a large record set when it comes to this warning. 

Even more than just a few hundred records using being stored in a parent-child dimension can significantly hurt the performance of your cube when browsing by the hierarchy.  This is of course worth testing in your own environment to see if it truly does cause performance issues, but if you notice any slowdown in query time this is likely your issue. 

So if you’ve already designed the table to store your data in the parent-child format and you discover this performance problem you do have an alternative.  One method that I have used is to create a view that naturalizes the table.  Naturalizing the table means you will flatten the table and actually make a column for each level that would exist in your hierarchy.  So if there are five levels in the dimension when you follow the self-referencing columns then you actually create five columns one for each level.  This will allow you to create a traditional hierarchy out of the parent-child dimension.  Here’s an example of what the view may look like:

CREATE VIEW [dbo].[DimNaturalized_DimEmployee] AS

WITH PCStructure(Level, [ParentEmployeeKey], [Employee_KeyColumn], [Employee Level 01_KeyColumn], [Employee Level 02_KeyColumn], [Employee Level 03_KeyColumn], [Employee Level 04_KeyColumn], [Employee Level 05_KeyColumn])

AS (SELECT 3 Level, [ParentEmployeeKey], [EmployeeKey],

[EmployeeKey] as [Employee Level 01_KeyColumn],

[EmployeeKey] as [Employee Level 02_KeyColumn],

[EmployeeKey] as [Employee Level 03_KeyColumn],

[EmployeeKey] as [Employee Level 04_KeyColumn],

[EmployeeKey] as [Employee Level 05_KeyColumn]

FROM [dbo].[DimEmployee] WHERE [ParentEmployeeKey] IS NULL OR [ParentEmployeeKey] = [EmployeeKey] UNION ALL SELECT Level + 1, e.[ParentEmployeeKey], e.[EmployeeKey],

CASE Level

WHEN 2 THEN e.[EmployeeKey] ELSE [Employee Level 01_KeyColumn]

END

AS [Employee Level 01_KeyColumn],

CASE Level

WHEN 2 THEN e.[EmployeeKey]

WHEN 3 THEN e.[EmployeeKey] ELSE [Employee Level 02_KeyColumn]

END

AS [Employee Level 02_KeyColumn],

CASE Level

WHEN 2 THEN e.[EmployeeKey]

WHEN 3 THEN e.[EmployeeKey]

WHEN 4 THEN e.[EmployeeKey] ELSE [Employee Level 03_KeyColumn]

END

AS [Employee Level 03_KeyColumn],

CASE Level

WHEN 2 THEN e.[EmployeeKey]

WHEN 3 THEN e.[EmployeeKey]

WHEN 4 THEN e.[EmployeeKey]

WHEN 5 THEN e.[EmployeeKey] ELSE [Employee Level 04_KeyColumn]

END

AS [Employee Level 04_KeyColumn],

CASE Level

WHEN 2 THEN e.[EmployeeKey]

WHEN 3 THEN e.[EmployeeKey]

WHEN 4 THEN e.[EmployeeKey]

WHEN 5 THEN e.[EmployeeKey]

WHEN 6 THEN e.[EmployeeKey] ELSE [Employee Level 05_KeyColumn]

END

AS [Employee Level 05_KeyColumn] FROM [dbo].[DimEmployee] e INNER JOIN PCStructure d ON e.[ParentEmployeeKey] = d.[Employee_KeyColumn] AND e.[ParentEmployeeKey] != e.[EmployeeKey])

select Level6Subselect.*

from PCStructure a,

(select [EmployeeKey] [Employee Level 05_KeyColumn], CASE

    WHEN MiddleName IS NULL THEN

        FirstName + ' ' + LastName

    ELSE

        FirstName + ' ' + MiddleName + '.' + ' ' + LastName

END [Employee Level 05_NameColumn]

, Level5Subselect.*

from [dbo].[DimEmployee] b,

(select [EmployeeKey] [Employee Level 04_KeyColumn], CASE

    WHEN MiddleName IS NULL THEN

        FirstName + ' ' + LastName

    ELSE

        FirstName + ' ' + MiddleName + '.' + ' ' + LastName

END [Employee Level 04_NameColumn]

, Level4Subselect.*

from [dbo].[DimEmployee] b,

(select [EmployeeKey] [Employee Level 03_KeyColumn], CASE

    WHEN MiddleName IS NULL THEN

        FirstName + ' ' + LastName

    ELSE

        FirstName + ' ' + MiddleName + '.' + ' ' + LastName

END [Employee Level 03_NameColumn]

, Level3Subselect.*

from [dbo].[DimEmployee] b,

(select [EmployeeKey] [Employee Level 02_KeyColumn], CASE

    WHEN MiddleName IS NULL THEN

        FirstName + ' ' + LastName

    ELSE

        FirstName + ' ' + MiddleName + '.' + ' ' + LastName

END [Employee Level 02_NameColumn]

, Level2Subselect.*

from [dbo].[DimEmployee] b,

(select [EmployeeKey] [Employee Level 01_KeyColumn], CASE

    WHEN MiddleName IS NULL THEN

        FirstName + ' ' + LastName

    ELSE

        FirstName + ' ' + MiddleName + '.' + ' ' + LastName

END [Employee Level 01_NameColumn],

 CurrentMemberSubselect.* from [dbo].[DimEmployee] b,

(select [EmployeeKey] [Employee_KeyColumn], CASE

    WHEN MiddleName IS NULL THEN

        FirstName + ' ' + LastName

    ELSE

        FirstName + ' ' + MiddleName + '.' + ' ' + LastName

END [Employee_NameColumn]

from [dbo].[DimEmployee] b)

CurrentMemberSubselect

) Level2Subselect

) Level3Subselect

) Level4Subselect

) Level5Subselect

) Level6Subselect

where

Level6Subselect.[Employee Level 05_KeyColumn] = a.[Employee Level 05_KeyColumn] and

Level6Subselect.[Employee Level 04_KeyColumn] = a.[Employee Level 04_KeyColumn] and

Level6Subselect.[Employee Level 03_KeyColumn] = a.[Employee Level 03_KeyColumn] and

Level6Subselect.[Employee Level 02_KeyColumn] = a.[Employee Level 02_KeyColumn] and

Level6Subselect.[Employee Level 01_KeyColumn] = a.[Employee Level 01_KeyColumn] and

Level6Subselect.[Employee_KeyColumn] = a.[Employee_KeyColumn]

GO

Looks like some pretty intimidating code right?  Well luckily some really smart developers have run into the same performance problem that I described earlier and developed a tool that can generate this view for you automatically.  If you download the tool PC Dimension Naturalizer, which luckily has lots of documentation on the site, it will automatically naturalize your table into a view for you.  Hope this article helps with your design and email me with any questions (dknight@pragmaticworks.com)! 

 
0
/5
Avg: 0/5: (0 votes)

Comments (8)

dpatel
dpatel said:

Thanks for the article Devin. I did implement parent-child hierarchy on one of my dimensions and was having some performance issues. I wish I had read this earlier :)

6/7/2010
 · 
 
by
Daniel
Daniel said:

Thanks Devin, great article.  FYI, in IE7 each graphic (all are showing) appears to have an empty graphic just below it.  Viewing in FF shows that nothing is actually missing even though it looks like there is.

I am curious if the size of the self referencing table is the only factor.  I would think that the number of levels in the hierarchy would create more of a performance problem than the overall size.  For example, if the the hierarchy was only 2 levels I would expect much better performance than if the hierarchy was 6 levels.  Is this true, or is it really mostly a matter of table size?

Although the script will work quite well overall, unless I am reading it wrong, it will reduce your flexibility.  You need to know the depth of your hierarchy before you begin.  If there is a structrual change in your organization, adding a few more levels, you are pushed back to the beginning.  I would think a ragged hierarchy would present similar problems.

6/7/2010
 · 
 
by
DevinKnight
DevinKnight said:

Daniel

 

That's strange I just noticed that.  I created the article in firefox where it shows up fine.  I'll let the site admin know. 

You're right about the size factor with performance.  Perhaps I should have worded it better.  The more levels that are generated with the self referencing table the worse the performance becomes.

You're also right about the flexibility of the script. You do need to know the depth of how many levels will occur before completing it.

6/7/2010
 · 
 
by
dpatel
dpatel said:

Thanks for the article Devin. I did implement parent-child hierarchy on one of my dimensions and was having some performance issues. I wish I had read this earlier :)

6/7/2010
 · 
 
by
briankmcdonald

As I'm trudging forward in my learning of SSAS, I enjoyed this article and look forward to implementing similar methods. Thanks for sharing.

6/7/2010
 · 
 
by
todd.meade
todd.meade said:

It feels wrong to me to conflate your data into the model.  Having to know how many levels your heirarchy has, rather than letting the data in the model tell you that means continual re-modelling as the data changes.

I think abstractly of a cube as having the same number of intersections regardless if you use a recursive description, of a naturalized/flatted model?  Can you explain why the recursive model has such a negative impact on SSAS after the cube is built?

12/23/2010
 · 
 
by
DevinKnight
DevinKnight said:

This is per technet - http://technet.microsoft.com/en-us/library/cc966527.aspx

In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute (for example, the All attribute) unless it is disabled. As a result, queries returning cells at intermediate levels are calculated at query time and can be slow for large parent-child dimensions. If you are in a design scenario with a large parent-child hierarchy (more than 250,000 members), you may want to consider altering the source schema to reorganize part or all of the hierarchy into a user hierarchy with a fixed number of levels.

12/27/2010
 · 
 
by
DustinRyan
DustinRyan said:

Great article, Devin. You made building parent child dimensions very clear and easy!

12/27/2010
 · 
 
by

Most Recent Articles