Tutorial Instructions
We store data in relational databases so we can organize and quickly find information we need. That's where JOINs come in. In relational databases we organize data in sets, that are called tables in SQL.
Joins are the mechanism to find information in one set (table) that matches information in a different set.
For these examples we'll look at a hypothetical database for a human resources application. This application keeps track of departments, employees and their managers and salaries.
An INNER JOIN
is the most common, and matches a column in one table to another.
In our HR database we would like to find (query) employees
and their
departments
. Let's take a look at the structure for those tables:
column | refers to |
---|---|
department_id | |
department_name | |
location_id | locations (location_id) |
For each department we record its id, name, and a location_id
that is a
reference to the locations
table, so we can know which office this
department is a part of.
As for the employees
table we have:
column | refers to |
---|---|
employee_id | |
first_name | |
last_name | |
start_date | |
job_title | |
salary | |
manager_id | employees (employee_id) |
department_id | departments (department_id) |
Each employee record/row has an employee_id
, things we need to know about
the employee such as first and last name, a manager_id
that refers to another
employee in the same table, and a department_id
which tells our system
which department each employee is a part of.
So let's say we want to query the employees and the departments they are a part of:
SELECT e.employee_id, e.first_name, e.last_name, d.department_name
FROM employees e
INNER JOIN departments d ON (e.department_id = d.department_id);
Note how we request employee_id
, first_name
and last_name
from
the table employees
and department_name
from the table departments
.
We match these two sets of information in the FROM
part of the query.
Note the INNER JOIN
and ON
keywords. In this case we're asking for
the intersection of the table employees
and the table departments
where the department_id
matches on both tables. That gives us this output:
We told the database server what information we wanted, and the database took care of finding and responding with that data, without the need for us to write a specific program to parse and retrieve the information.
In the real world things don't always match perfectly. OUTER JOIN
s deal
with those mismatched sets of data, by returning NULL
values for the
table that does not have match (as a side note, NULL
means "undefined" in SQL).
There are different kinds of OUTER JOIN
s.
In a LEFT OUTER JOIN
all rows from the left side table will be returned,
and NULL
will be returned for the non-matching right side table rows.
In our HR database, our employees
can have dependents
:
Column | refers to |
---|---|
dependent_id | |
first_name | |
last_name | |
employee_id | employees (employee_id) |
Let's say in addition to the departments, we also need the first and last names of each employee's dependents:
SELECT e.employee_id, e.first_name, e.last_name, d.department_name,
dep.first_name || ' ' || dep.last_name AS dependent
FROM employees e
INNER JOIN departments d ON (e.department_id = d.department_id)
LEFT OUTER JOIN dependents dep ON (e.employee_id = dep.employee_id);
The LEFT OUTER JOIN
is joining employees
on the left and dependents
on the right, which means we will get the records from employees (left) but
may get some NULL
values for dependents
on the right.
employee_id | first_name | last_name | department_name | dependent
-------------+-------------+------------+-----------------+-------------------
1 | John | Smith | Management | Spoiled Smith
1 | John | Smith | Management | DoNothin Smith
2 | Pointy | HairedBoss | IT | Rafael HairedBoss
2 | Pointy | HairedBoss | IT | Beta HairedBoss
3 | Dilbertzoni | DeScrewed | IT | Wanda DeScrewed
5 | Howard | TheDuck | Sales |
8 | Lea | Palpatine | IT |
6 | Luke | Fett | Sales |
4 | Vanda | Socialist | Human Resources |
7 | John | SkipWork | Human Resources |
But why? Well, because some employees may not have any dependents, and we need a list of all employees, even if they have no dependents. We see from what our PostgreSQL database returned that John Smith has two dependents: Spoiled and DoNothing Smith, and the employee Howard TheDuck has none.
Had we used an INNER JOIN
then only employees with dependents would have
been returned.
In a RIGHT OUTER JOIN
all rows from the right side table will be returned,
and NULL
will be returned for the non-matching left side table rows.
We already saw how our HR database keeps track of departments
and each department
has a location_id
referring to the locations
table, which records where
each office is located. Let’s look at the structure for the locations
table:
Column |
---|
location_id |
street_address |
postal_code |
city |
state_province |
country_id |
Now let's query our database for our departments and their locations:
SELECT d.department_name, l.city, l.state_province, l.country_id
FROM departments d
RIGHT OUTER JOIN locations l ON (d.location_id = l.location_id);
Note that we joined the departments
table on the left to the locations
table
on the right, on the location_id
column, which gives us this output:
department_name | city | state_province | country_id
------------------------+-----------+----------------+------------
Management | Effingham | Indiana | US
Information Technology | London | London | UK
Human Resources | Munich | Bavaria | DE
Sales | Effingham | Indiana | US
| Lehi | Utah
This tells us, among the list of offices and locations, that the Effingham, Indiana office has no department assigned to it, and therefore is closed.
A self join is not a different type of join in a relational database, but it's important to mention them because they often happen in real systems.
A self join is one that joins a table to itself. It is often useful when the table
contains hierarchical data. In our HR database we've already seen the employees
table. Notice that each employee has a manager_id
, him/herself being employee
.
We can therefore ask our database to give us a list of employees and their managers:
SELECT e.first_name, e.last_name, e.job_title,
format('%s %s', m.first_name, m.last_name) AS manager
FROM employees e LEFT OUTER JOIN employees m ON (e.manager_id = m.employee_id);
In this case we are joining the employees
table (aliased as e
) to itself, aliased as m
(for managers). We joined them with a LEFT OUTER JOIN
because at least one employee doesn't have a manager: the president of the company, John Smith.
first_name | last_name | job_title | manager
-------------+------------+-------------------------------+-------------------
John | Smith | President |
Pointy | HairedBoss | CTO | John Smith
Dilbertzoni | DeScrewed | Software Engineer | Pointy HairedBoss
Vanda | Socialist | VP Equity Diversity Inclusion | John Smith
Howard | TheDuck | VP Sales | John Smith
Luke | Fett | Sales Guy | Howard TheDuck
John | SkipWork | HR Henchman | Vanda Socialist
Lea | Palpatine | UI Designer | Pointy HairedBoss
We covered the most common types of relational database joins. You're now
able to explore querying data from a database. There are many more ways to query data in a relational database. You can use other operators in your joins, besides the =
operator, for example.
The relationships between different sets of data, along with the key for each table in your database, are crucial for a well-designed database and performant queries.
Loading terminal...
Loading terminal...