SQL Joins in SQL Server
This article is all about joins in SQL Server. SQL Server 2000 has been used for this article but these should work fine in newer versions of SQL Server.
For a quick revision, jump to the summary section.
The article will have a hands on approach and you are encouraged to do the exercixes as this will certainly aid in understanding and memory retention.
So first we need a new database so ahead and set one up. We will look at a simple business model with customers, orders, products etc. The tables will have just the bare essential columns needed to do our exercises.
The tables we will need are as follows:
The scripts to create these tables are as follows so you can just paste these into Query Analyser and away you .
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[Company]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[Company]
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[Customer]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[Customer]
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[OrderDetails]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[OrderDetails]
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[Orders]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[Orders]
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[Product]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[Product]
CREATE TABLE [dbo].[Company] (
[CompanyID] [int] IDENTITY (1, 1) NOT NULL ,
[CompanyName] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[Customer] (
[CustomerID] [int] IDENTITY (1, 1) NOT NULL ,
[Firstname] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,
[Surname] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,
[CompanyID] [int] NOT NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[OrderDetails] (
[OrderID] [int] NOT NULL ,
[ProductID] [int] NOT NULL ,
[Quantity] [int] NOT NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[Orders] (
[OrderID] [int] IDENTITY (1, 1) NOT NULL ,
[CustomerID] [int] NOT NULL ,
[OrderDate] [datetime] NOT NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[Product] (
[ProductID] [int] IDENTITY (1, 1) NOT NULL ,
[Description] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,
[Price] [money] NOT NULL
) ON [PRIMARY]
ALTER TABLE [dbo].[Company] ADD
CONSTRAINT [PK_Company] PRIMARY KEY CLUSTERED
(
[CompanyID]
) ON [PRIMARY]
ALTER TABLE [dbo].[Customer] ADD
CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED
(
[CustomerID]
) ON [PRIMARY]
ALTER TABLE [dbo].[OrderDetails] ADD
CONSTRAINT [PK_OrderDetails] PRIMARY KEY CLUSTERED
(
[OrderID],
[ProductID]
) ON [PRIMARY]
ALTER TABLE [dbo].[Orders] ADD
CONSTRAINT [PK_Orders] PRIMARY KEY CLUSTERED
(
[OrderID]
) ON [PRIMARY]
ALTER TABLE [dbo].[Product] ADD
CONSTRAINT [PK_Product] PRIMARY KEY CLUSTERED
(
[ProductID]
) ON [PRIMARY]
You can also run the following scrips which will insert some test data which we can use for our joins.
Truncate Table Company
Truncate Table Customer
Truncate Table OrderDetails
Truncate Table Orders
Truncate Table Product
SET IDENTITY_INSERT [dbo].[Company] ON
INSERT INTO [dbo].[Company] ([CompanyID], [CompanyName])
VALUES (1, 'Tescos')
INSERT INTO [dbo].[Company] ([CompanyID], [CompanyName])
VALUES (2, 'HSBC')
INSERT INTO [dbo].[Company] ([CompanyID], [CompanyName])
VALUES (3, 'EON')
INSERT INTO [dbo].[Company] ([CompanyID], [CompanyName])
VALUES (4, 'Britsih Gas')
INSERT INTO [dbo].[Company] ([CompanyID], [CompanyName])
VALUES (5, 'Wilkos')
INSERT INTO [dbo].[Company] ([CompanyID], [CompanyName])
VALUES (6, 'Morrisons')
SET IDENTITY_INSERT [dbo].[Company] OFF
--------------------------------------------------------------------------------
SET IDENTITY_INSERT [dbo].[Customer] ON
INSERT INTO [dbo].[Customer] ([CustomerID], [Firstname], [Surname], [CompanyID])
VALUES (1, 'Dave', 'Bloggs', 1)
INSERT INTO [dbo].[Customer] ([CustomerID], [Firstname], [Surname], [CompanyID])
VALUES (2, 'Sarah', 'Smith', 2)
INSERT INTO [dbo].[Customer] ([CustomerID], [Firstname], [Surname], [CompanyID])
VALUES (3, 'Bill', 'Myatt', 2)
INSERT INTO [dbo].[Customer] ([CustomerID], [Firstname], [Surname], [CompanyID])
VALUES (4, 'Tracey', 'Amour', 3)
INSERT INTO [dbo].[Customer] ([CustomerID], [Firstname], [Surname], [CompanyID])
VALUES (5, 'Tiffany', 'Green', 4)
INSERT INTO [dbo].[Customer] ([CustomerID], [Firstname], [Surname], [CompanyID])
VALUES (6, 'William', 'Burke', 5)
INSERT INTO [dbo].[Customer] ([CustomerID], [Firstname], [Surname], [CompanyID])
VALUES (7, 'Chris', 'Smith', 6)
INSERT INTO [dbo].[Customer] ([CustomerID], [Firstname], [Surname], [CompanyID])
VALUES (8, 'Mandy', 'Bloggs', 6)
INSERT INTO [dbo].[Customer] ([CustomerID], [Firstname], [Surname], [CompanyID])
VALUES (9, 'Sarah', 'Blinkley', 4)
SET IDENTITY_INSERT [dbo].[Customer] OFF
--------------------------------------------------------------------------------
INSERT INTO [dbo].[OrderDetails] ([OrderID], [ProductID], [Quantity])
VALUES (1, 1, 4)
INSERT INTO [dbo].[OrderDetails] ([OrderID], [ProductID], [Quantity])
VALUES (1, 2, 20)
INSERT INTO [dbo].[OrderDetails] ([OrderID], [ProductID], [Quantity])
VALUES (1, 4, 10)
INSERT INTO [dbo].[OrderDetails] ([OrderID], [ProductID], [Quantity])
VALUES (1, 5, 100)
INSERT INTO [dbo].[OrderDetails] ([OrderID], [ProductID], [Quantity])
VALUES (2, 2, 10)
INSERT INTO [dbo].[OrderDetails] ([OrderID], [ProductID], [Quantity])
VALUES (2, 4, 25)
INSERT INTO [dbo].[OrderDetails] ([OrderID], [ProductID], [Quantity])
VALUES (3, 1, 10)
INSERT INTO [dbo].[OrderDetails] ([OrderID], [ProductID], [Quantity])
VALUES (3, 2, 15)
INSERT INTO [dbo].[OrderDetails] ([OrderID], [ProductID], [Quantity])
VALUES (3, 3, 20)
INSERT INTO [dbo].[OrderDetails] ([OrderID], [ProductID], [Quantity])
VALUES (3, 7, 50)
INSERT INTO [dbo].[OrderDetails] ([OrderID], [ProductID], [Quantity])
VALUES (4, 1, 100)
INSERT INTO [dbo].[OrderDetails] ([OrderID], [ProductID], [Quantity])
VALUES (5, 2, 150)
--------------------------------------------------------------------------------
SET IDENTITY_INSERT [dbo].[Orders] ON
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (1, 1, '20090402')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (2, 1, '20090403')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (3, 2, '20090415')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (4, 2, '20090402')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (5, 3, '20090314')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (6, 3, '20090302')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (7, 3, '20090402')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (8, 4, '20090408')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (9, 4, '20090407')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (10, 4, '20090412')
/***************************************/
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (11, 5, '20090412')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (12, 5, '20090218')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (13, 5, '20090117')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (14, 6, '20090311')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (15, 7, '20090212')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (16, 7, '20090419')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (19, 100, '20090422')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (20, 101, '20090411')
INSERT INTO [dbo].[Orders] ([OrderID], [CustomerID], [OrderDate])
VALUES (21, 102, '20090413')
SET IDENTITY_INSERT [dbo].[Orders] OFF
--------------------------------------------------------------------------------
SET IDENTITY_INSERT [dbo].[Product] ON
INSERT INTO [dbo].[Product] ([ProductID], [Description], [Price])
VALUES (1, 'Bread', 0.5500)
INSERT INTO [dbo].[Product] ([ProductID], [Description], [Price])
VALUES (2, 'Cheese', 2.9900)
INSERT INTO [dbo].[Product] ([ProductID], [Description], [Price])
VALUES (3, 'Milk', 0.8900)
INSERT INTO [dbo].[Product] ([ProductID], [Description], [Price])
VALUES (4, 'Beer', 4.9900)
INSERT INTO [dbo].[Product] ([ProductID], [Description], [Price])
VALUES (5, 'Red Wine', 3.9900)
INSERT INTO [dbo].[Product] ([ProductID], [Description], [Price])
VALUES (6, 'Chocolate', 1.1800)
INSERT INTO [dbo].[Product] ([ProductID], [Description], [Price])
VALUES (7, 'Cake', 1.2500)
INSERT INTO [dbo].[Product] ([ProductID], [Description], [Price])
VALUES (8, 'Rice', 1.5800)
SET IDENTITY_INSERT [dbo].[Product] OFF
Ok so now we have some tables which can be summarised as follows:
Contains basic company information. In this case we just have a CompanyID which is the primary key and a CompanyName.
This contains customer details. This table has a CustomerID which is the primary key and a Firstname and Surname. It also has a CompanyID which is a foreign key and relates to the CompanyID in the Company table. so we can se therefore that each Company has many customers.
Each customer can have many orders and the orders table contains the basic order infromation. There is 1 row in the Orders table for each order.
The table has a OrderID column which is the primary key. It also has a CustomerID column which is a foreign key and relates to ther Customer table. Finally it has an OrderDate column.
Each item belonging to an order will have a row in this table and so a record in the Orders table can have many records in this table. This table has an OrderID foreign key which relates to the Orders table and it also has a ProductID foreign key which relates to the Product table. These two keys togther form a compound primary key since records in here may be for the same Order and also a product can appear in multiple records here. Together though these must be unique - you wouldn't have an order with 4 jars of coffee and 3 jars of coffee - you would just have 1 record with 7 jars of coffee.
Finally we have the product table which just has 1 row for each product. There is a ProductID which is the primary key along with a description and price.
We can also have a look at the relationships between tables with a Relationship diagram which would look something like this -Please don't implement one of these though as we have deliberatley populated our tables with some data which would break referential integrity pureley for the purpose of showing you particular join types.
So this article is really just about looking at the different types of joins we have at our disposal in SQL Server but before doing this lets take a quick look at the anatomy of a join.
A typical join might look like this:
The following scribblings will help to explain what is going on here:
So what this means is Select all columns from the Customer table and all columns from the Orders table but join them where Customer.CustomerID = Orders.CustomerID. So for each record in the Customer table we will get many records potentially joined from the Orders table. If this is the case then the data in the Customer columns will just repeat while the data in the Order columns will vary.
The Left join is a specific type of join and what this means is select all rows from table1 (ie the one we are selecting from) regardless of whether there are corresponding rows in the table2 (the table we are joining). Should it be the case that there are customers without orders then the data which will show for the Order columns will just show as Nulls. So with our current test data we will get results like this.
Note here then that we get rows for Mandy Bloggs and Sarah Blinkley even though they do not have any orders.
Also take note that almost all joins are performed with a primary key and a foreign key. This is the most practical use of joins but of course this doesn't have to be the case - you can join on any 2 columns as long as they are of the same type, its just most join applications will involve primary and foreign keys.
We can use different types of joins to affect how joins are performed. For example if in our above example we change the word Left to Inner then this will work in exactly the same way other than the inner join only shows records from the first table (Customer) where there are matching records in the second table (Orders). And so changing our query to an inner join will give us different results as follows:
See how now we don't get any rows for Mandy Bloggs and Sarah Blinkley as the do not have any corresponding orders.
Ok so we have actually looked at a couple of joins here already but lets now have a formal breakdown of what is available to us.
The key joins we need to be aware of are as follows:
This can also just be performed using the join keyword on it's own in SQL Server anyway, not sure about other database systems.
This selects all rows from table1 where there is at least 1 matching row in table2 according to the join criteria.
Example:
This can also be called a left outer join and using either left or left outer in the query will give us the same result.
This selects all records from table1 regardless of whether there are any matching rows in table2 according to the join criteria. Where there are matching rows then that data will be shown but where there are no matching rows then the columns for table2 in the join results will be filled out with null. See the example below - last 2 rows for examples of this since in our test datbase Mandy Blogs and Sarah Blinkley have no orders.
This can also be called a right outer join and using either right or right outer in the query will give us the same result.
This is basically a mirror image of a left join. So we could use a right join to achieve the same results as the left join did in our last example if we swapped the tables over so table1 became table2 and vice versa. However, doing this would be pretty pointless and unnatural. We would really only use a right join where it was appropriate to the circumstances.
However, for the sake of completeness lets look at achieving the same results as our left join but using a right join and then we will look at a more realistic scenario.
Ok so a more realistic example might be that we are fixing some problems in our database and somehow we have been left with orders which don't belong to any customers. Now I know this normally shouldn't happen if we have our referential integrity set up correctly but lets assume this has happened for whatever reason - maybe we have inherited a poorly designed database from someone else and we need to sort out the mess for example.
So in this case we want to perform a join which will return all rows from orders and corresponding rows from customer. If there are no matching rows in customer then we want to still see the rows from the orders table and have the columns from customer filled in with nulls.
Now we could achieve this with either a left join or a right join but since the natural relationship is that a customer has an order then it is more natural to think left to right for this particular scenario and that's really the key to using left and right joins - use whichever is more natural or intuitive!
So our query and results then will look like this - see how we have found 3 orders which don't have a customer so we will now be able to and fix that problem and get our database in shape!
This is a combination of a left join and a right join. So all rows from both tables will be retrieved and where there is matching data from the other table that will be shown, otherwise the columns will be padded out with nulls. This is sometimes also called a full outer join and you will get the same results whether you just put full join or full outer join.
An example of a full join is shown below and so we can see that we get the orders without customers and the customers who have no orders.
Other names for this include a cross join or a product join.
This join will take row 1 from table1 and join it with every single row in table2. It will then repreat this for every row in table1.
So in our test database if we do a cartesian join with Customer and Orders then since we have 9 Customer rows and 19 Orders rows then we will get 9 * 19 = 171 rows in the results as shown below:
Another way of writing the Syntax for a cartesian join is as follows:
This doesn't actually work in SQL Server 2000 but I wanted to include it anyway. It does work in some other database systems and may be supported in future versions of SQL Server so it's worth knowing about.
This type of join automatically joins all columns in both tables with the same name - thus eliminating the need to include the join in a WHERE statement. The natural join may be used in both inner and outer joins using the following syntax:
Natural Join - SELECT fldlist
FROM table1 NATURAL JOIN table2
Another Oracle SQL (9i only) addition is the USING clause. Rely on this new clause when tables have multiple columns with the same name and you want to limit the join to one particular column. You can modify the previous syntax by replacing the ON clause with a USING clause as follows:
SELECT fldlist
FROM table1 JOIN table2
USING (column)
So far we have just looked at joins involving two tables. In reality we often join with multiple tables. This isn't as daunting as it sounds. Consider the following example.
So what we are doing here is basically de-normalising. We perform a join on two tables and then again on another two tables and so on until we have all the information we need. I think this is quite intuitive if you have a look at it and this is quite often how you will see joins being used in real life applications.
So to break this example down then we are selecting all rows from various tables but we are not using the wildcard * to signify all columns - we are explicitly stating the columns we need. Note also that the order of the columns in the select statement can be in any order and is not related to the order in which we select tables. The column order should be whatever is logical for the needs of your application.
So we select all rows from Customer and then select all rows from Orders where these tables match by CustomerID - ie primary and foreign key. We are using an inner join here so we only select rows from Customer where thay actually have orders.
Next we select all rows from OrderDetails where OrderDetails matches with the Orders table by OrderID - primary and foreign key again. We are also using an Inner Jon here so we won't end up with any orders which don't have order details.
Next we join the Company table (just so we can get the company name) where the CompanyID in the Company table matches the CompanyID in the Customer table - primary and foreign key again. We also use an inner join again. In fact inner joins are used throughout this query and are very common since you only end up with sensible data which usually fulfills referential integrity.
Finally we join the Product table against the OrderDetails table by ProductID - primary and foreign key again of course so we can get the product details.
Summary
Inner Join
Left Join
Right Join
Full Outer Join
Cross Join
Self Join
And that's it we are all done.
So if you tackle multiple joins step by step and build them up one step at a time and test as you along you should be fine and after a while it really does become very intuitive.
Any questions or comments welcome as usual via the form below.