Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

Using LEFT JOIN: Watch out for surprises!

  • 11 December 2012
  • Author: Kathi Kellenberger
  • Number of views: 6778
  • 0 Comments

Today I’d like to talk to you a bit about LEFT OUTER JOINs and what to watch out for.

LEFT OUTER JOIN, or LEFT JOIN as I usually type it, returns all of the qualifying rows from the table on the LEFT side of the join and any rows on the RIGHT side that match. If you continue to join to more tables on the RIGHT side, you will need to continue using LEFT JOIN. For example, if I want a list of all the customers in AdventureWorks along with their SalesOrderIDs and OrderDates for any orders they may have placed, I’ll use a LEFT JOIN like this:

SELECT Cust.CustomerID, Cust.AccountNumber, OH.SalesOrderID, OH.OrderDate
FROM Sales.Customer Cust
LEFT JOIN Sales.SalesOrderHeader OH ON Cust.CustomerID = OH.CustomerID;

A total of 32,166 rows are returned. Here are the partial results:


Any rows for customers without orders will have NULL values in the SalesOrderID and OrderDate columns. Let’s say I am interested in seeing some information about the products that were ordered as well. To do so I’ll have to join to Sales.SalesOrderDetail and Production.Product.

SELECT Cust.AccountNumber, OH.SalesOrderID, OH.OrderDate,
 Prod.Name, Prod.Color
FROM Sales.Customer Cust
LEFT JOIN Sales.SalesOrderHeader OH ON Cust.CustomerID = OH.CustomerID
INNER JOIN Sales.SalesOrderDetail DET ON OH.SalesOrderID = DET.SalesOrderID
INNER JOIN Production.Product Prod ON DET.ProductID = Prod.ProductID;

The problem with this query is that any customers with no orders have dropped out of the results. NOTE: I can check this by filtering by OH.SalesOrderID IS NULL.

The reason for this is that any rows where the SalesOrderID is NULL would have to join to a row from Sales.SalesOrderDetail where SalesOrderID is also NULL. Since we can’t join NULL to NULL, the rows drop out. To avoid having the customers with no orders drop out of the results, I’ll need to continue with LEFT JOIN.

SELECT Cust.AccountNumber, OH.SalesOrderID, OH.OrderDate,
     Prod.Name, Prod.Color
FROM Sales.Customer Cust
LEFT JOIN Sales.SalesOrderHeader OH ON Cust.CustomerID = OH.CustomerID
LEFT JOIN Sales.SalesOrderDetail DET ON OH.SalesOrderID = DET.SalesOrderID
LEFT JOIN Production.Product Prod ON DET.ProductID = Prod.ProductID

This time, my customers with no orders do not drop out of the query results. The next thing I may need to do is filter the results. What if I would like to return a list of all the customers even if they have not placed an order along with their orders, but only orders of bikes?  I’ll join to the Production.ProductSubcategory and Production.ProductCategory tables and then add a WHERE clause, but I end up with another problem.

SELECT Cust.CustomerID, Cust.AccountNumber, OH.SalesOrderID, OH.OrderDate,
     Prod.Name, Prod.Color, CAT.Name AS Category
FROM Sales.Customer Cust
LEFT JOIN Sales.SalesOrderHeader OH ON Cust.CustomerID = OH.CustomerID
LEFT JOIN Sales.SalesOrderDetail DET ON OH.SalesOrderID = DET.SalesOrderID
LEFT JOIN Production.Product Prod ON DET.ProductID = Prod.ProductID
LEFT JOIN Production.ProductSubcategory  SubCAT ON Prod.ProductSubcategoryID = SubCAT.ProductSubcategoryID
LEFT JOIN Production.ProductCategory CAT ON SubCAT.ProductCategoryID = CAT.ProductCategoryID
WHERE  CAT.Name = 'Bikes';

In this case, my customers with no orders drop out of the results despite all the left joins. The reason for this is that any rows where the category name is NULL, because there was no order in the first place, cannot be compared to the value Bikes.  The solution to this dilemma is not exactly what you would think.  You may think that the answer would be to just move CAT.Name = ‘Bikes’ to the join condition like this.

SELECT Cust.CustomerID, Cust.AccountNumber, OH.SalesOrderID, OH.OrderDate,
     Prod.Name, Prod.Color, CAT.Name
FROM Sales.Customer Cust
LEFT JOIN Sales.SalesOrderHeader OH ON Cust.CustomerID = OH.CustomerID
LEFT JOIN Sales.SalesOrderDetail DET ON OH.SalesOrderID = DET.SalesOrderID
LEFT JOIN Production.Product Prod ON DET.ProductID = Prod.ProductID
LEFT JOIN Production.ProductSubcategory  SubCAT ON Prod.ProductSubcategoryID = SubCAT.ProductSubcategoryID
LEFT JOIN Production.ProductCategory CAT ON SubCAT.ProductCategoryID = CAT.ProductCategoryID
     AND CAT.Name = 'Bikes';

The problem with this is that we end up getting all the orders, not just those for Bikes. The category name is NULL, but the items are not all from the bikes category.


 
One way to solve this problem is to force the JOINS after the first LEFT JOIN  to be evaluated first by surrounding those joins with parentheses and changing those other joins back to INNER JOINS.

SELECT Cust.CustomerID, Cust.AccountNumber, OH.SalesOrderID, OH.OrderDate,
 Prod.Name, Prod.Color, CAT.Name
FROM Sales.Customer Cust
LEFT JOIN (Sales.SalesOrderHeader OH
   JOIN Sales.SalesOrderDetail DET ON OH.SalesOrderID = DET.SalesOrderID
   JOIN Production.Product Prod ON DET.ProductID = Prod.ProductID
   JOIN Production.ProductSubcategory  SubCAT ON Prod.ProductSubcategoryID = SubCAT.ProductSubcategoryID
   JOIN Production.ProductCategory CAT ON SubCAT.ProductCategoryID = CAT.ProductCategoryID
   AND CAT.Name = 'Bikes')  ON Cust.CustomerID = OH.CustomerID;

Another method is to move the tables from the right part of the query to a CTE or Common Table Expression:

;WITH Details AS (
     SELECT CustomerID, OrderDate, Prod.Name, Prod.Color, CAT.Name AS Category
     FROM Sales.SalesOrderHeader OH
     JOIN Sales.SalesOrderDetail DET ON OH.SalesOrderID = DET.SalesOrderID
     JOIN Production.Product Prod ON DET.ProductID = Prod.ProductID
     JOIN Production.ProductSubcategory  SubCAT ON Prod.ProductSubcategoryID = SubCAT.ProductSubcategoryID
     JOIN Production.ProductCategory CAT ON SubCAT.ProductCategoryID = CAT.ProductCategoryID
     WHERE CAT.Name = 'Bikes')
SELECT Cust.CustomerID, Cust.AccountNumber, OrderDate, Name, Color, Category
FROM Sales.Customer Cust
LEFT JOIN Details ON Cust.CustomerID = Details.CustomerID;

Now we get all the customers even if they have not placed an order for a bike and the only orders we see are for bikes. The performance of the queries is almost equivalent according to STATISTICS IO:

 


The lesson here is to always check your results to make sure your query returns the results you expect!

 

Print
Categories: SQL Server
Tags:
Rate this article:
No rating

Please login or register to post comments.