Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

Disecting the Self Referencing Hierarchy

  • 20 October 2010
  • Author: BradSchacht
  • Number of views: 7197
  • 0 Comments

There are times when you will have a hierarchy built into a table.  This is where the table references itself for the next level.  So there is a foreign and primary key relationship to the same table.  This is illustrated several places in the AdventureWorksDW database (including 2008 and 2008R2).  The best example is in the Employee table.  There is a parent key column that references another employee key in the table.  So EmployeeKey 7 in DimEmployee has a ParentEmployeeKey of 112.  In order to find that employee, again look in the DimEmployee table and find key number 112.  Something to notice here is that the ParentEmployeeKey is NULL; that is because this is the highest employee in the table.  There is no one above employee 112.  So what if we want to build the hierarchy from the top down for this table?  How do we do it?

The answer is actually pretty simple.  Keep joining back to the same table over and over until you get down the desired number of hierarchy levels.  So in this case we just want to find the hierarchy from Employee 7 up to the top of the hierarchy.

SELECT d.EmployeeKey AS HighestLevel, dd.EmployeeKey AS SecondHighestLevel
FROM DimEmployee d
LEFT JOIN DimEmployee dd on d.EmployeeKey = dd.ParentEmployeeKey
where dd.EmployeeKey = 7

This will give us the results listed below:

HighestLevel SecondHighestLevel

----------- -----------

112         7

Notice we are doing the WHERE on the join table, not the base table in the select.  This is because we only want to see the people above employee 7.  If we wanted to see everyone that reports to employee 7 on the other hand, we just have to switch our WHERE from dd.Employee key to read d.EmployeeKey like this:

SELECT d.EmployeeKey AS Manager, dd.EmployeeKey AS Employee
FROM DimEmployee d
LEFT JOIN DimEmployee dd on d.EmployeeKey = dd.ParentEmployeeKey
where d.EmployeeKey = 7

Now we get the following:

Manager Employee

----------- -----------

7           2

7           48

7           109

7           122

7           207

7           273

7           275

7           276

 

The reason for this is now we are saying any record that has EmployeeKey 7 as the ParentKey is going to be listed; whereas before we were saying where the actual employee key is 7 and we want to see everyone above that person.  The difference is in the first query you can only have one instance where the employee key IS 7.  However, the second query you can have MANY records where the ParentKey is 7.  This is the reason we have one line in the output as opposed to 8.  I happen to know there are five levels in the hierarchy in DimEmployee.  The code will be listed below along with the results for the first query (the second one outputs a lot of records.  If you want to see the results just run it in management studio.)  you can explore the different ways to flatten your hierarchy table.  Notice the difference in the single line output of the first QUERY as opposed to the ENTIRE table being flattened in the SECOND query.  The reason why we have to use the where statement in the second query is so that we know what to roll all the employees up to.  If we want to roll everyone up to the President then we need to tell the query to only bring back results where the highest level is the President.  Likewise, if we only want to bring back employees for a particular region, we have to give that region manager's ID.  If we do not specify an employee in then every single hierarchy in the table will be brought back.  For instance, everyone that reports to the president would be returned, as well as everyone who reports to the vice president (which is already included in the president's query), and so on.  You may want to have a single query to show every hierarchy in the table, and that is fine.  Just be aware of the difference between adding a where clause and leaving it off.  The record difference between the two is drastic; as is the value of the data for that matter.

Please feel free to ask me questions as I know this can be a very confusing topic.  It is basically circular reasoning applied to query writing.  Never fun to think through.  Keep in mind that for both of these queries the highest level of the hierarchy (i.e. the president of the company) is on the left and the lowest level (i.e. employee of a retail store manager) is on the right.

SELECT d.EmployeeKey, dd.EmployeeKey, ddd.EmployeeKey, dddd.EmployeeKey, ddddd.EmployeeKey
FROM DimEmployee d
LEFT JOIN DimEmployee dd on d.EmployeeKey = dd.ParentEmployeeKey
LEFT JOIN DimEmployee ddd on dd.EmployeeKey = ddd.ParentEmployeeKey
LEFT JOIN DimEmployee dddd on ddd.EmployeeKey = dddd.ParentEmployeeKey
LEFT JOIN DimEmployee ddddd on dddd.EmployeeKey = ddddd.ParentEmployeeKey
WHERE ddddd.EmployeeKey = 246

EmployeeKey EmployeeKey EmployeeKey EmployeeKey EmployeeKey

----------- ----------- ----------- ----------- -----------

112         152         222         51          246

 

SELECT d.EmployeeKey, dd.EmployeeKey, ddd.EmployeeKey, dddd.EmployeeKey, ddddd.EmployeeKey
FROM DimEmployee d
LEFT JOIN DimEmployee dd on d.EmployeeKey = dd.ParentEmployeeKey
LEFT JOIN DimEmployee ddd on dd.EmployeeKey = ddd.ParentEmployeeKey
LEFT JOIN DimEmployee dddd on ddd.EmployeeKey = dddd.ParentEmployeeKey|
LEFT JOIN DimEmployee ddddd on dddd.EmployeeKey = ddddd.ParentEmployeeKey
WHERE d.EmployeeKey = 112

 

Print
Categories: Miscellaneous
Tags:
Rate this article:
5.0
BradSchacht

BradSchachtBradSchacht

Other posts by BradSchacht

Please login or register to post comments.