Back-end Engineering Articles

I write and talk about backend stuff like Ruby, Ruby On Rails, Databases, Testing, Architecture / Infrastructure / System Design, Cloud, DevOps, Backgroud Jobs, some JS stuff and more...

Github:
/danielmoralesp
Twitter:
@danielmpbp

2025-04-15

Introduction to SQL - Joins

Now that we know how to set and work with multiple tables, it is time to join all of these tables in some manner, just to have more sense about the data stored right there, or because we want to search for a particular data.

To do so, we have to know another concept called: Joins. 

Note: we’ll continue working with tables created in the last blog post, so please read it and execute it to have the right context about the current topic.

In that blog post we had this tables


We tried to figure out how a record looks like following the Foreign Keys. For instance if we followed the green path that belongs to Order with id 1 we have this:


We must first look at the table Order because it contains the foreign key, and we can follow the right path. So let's follow the green-way:
* Order with id 1 belongs to a customer with id 2, Ana.
* Order with id 1 belongs to a subscription with id 1, The Time Magazine.
* This means that Ana is the owner of the Order with id 1, and she bought a subscription to Time Magazine, and the Order was made at 01-10-2021


Let's follow the other path. Let's call it the red path.


In this case
* Order with id 2 belongs to a customer with id 1, Justin.
* Order with id 2 belongs to a subscription with id 2, The Economist.
* This means that Justin is the owner of the Order with id 2, and he bought a subscription to The Economist, and the Order was made at 02-02-2022

Current Data
Now, let’s check what's inside our current tables

Note: if you followed last blog post you have this tables created in PostgreSQL

First let’s explore the current data in Orders and Customers

Orders

newsletters=# SELECT * FROM orders;
 order_id | customer_id | subscription_id |    purchase_date    
----------+-------------+-----------------+---------------------
        1 |           3 |               2 | 2021-05-14 14:51:53
        2 |           2 |               4 | 2021-05-15 17:51:53
        3 |           3 |               4 | 2021-05-18 13:51:53
        4 |           9 |               9 | 2021-05-21 17:51:53
        5 |           7 |               5 | 2021-06-02 12:51:53
        6 |           8 |               2 | 2021-06-14 13:51:53
        7 |           5 |               8 | 2021-06-14 17:51:53
        8 |           9 |               5 | 2021-06-16 19:51:53
        9 |           4 |               4 | 2021-06-17 20:51:53
       10 |           1 |               7 | 2021-07-18 22:51:53
       11 |           5 |               4 | 2021-07-20 17:51:53
       12 |           3 |               2 | 2021-09-21 17:51:53
       13 |           3 |               5 | 2021-12-01 18:51:53
       14 |           6 |               5 | 2021-12-14 16:51:53
       15 |           1 |               2 | 2021-12-15 15:51:53
       16 |           1 |               2 | 2021-12-16 14:51:53
       17 |           3 |               6 | 2021-12-17 13:51:53
       18 |           3 |               8 | 2021-12-18 12:51:53
       19 |           4 |               9 | 2021-12-19 11:51:53
       20 |           1 |               7 | 2021-12-19 10:51:53
(20 rows)




Customers

newsletters=# SELECT * FROM customers;
 customer_id |   customer_name   |      address      
-------------+-------------------+-------------------
           1 | Allie Rahaim      | 123 Broadway
           2 | Jacquline Diddle  | 456 Park Ave.
           3 | Lizabeth Letsche  | 789 Main St.
           4 | Jessia Butman     | 1 Columbus Ave.
           5 | Inocencia Goyco   | 12 Amsterdam Ave.
           6 | Bethann Schraub   | 29 Monticello
           7 | Janay Priolo      | 81 Harrisburg
           8 | Ophelia Sturdnant | 31 Deerfield Ave.
           9 | Eryn Vilar        | 56 Morton St.
          10 | Jina Farraj       | 100 Bryan Ave.
(10 rows)




newsletters=# SELECT * FROM subscriptions;
 subscription_id |    description    | price_per_month | subscription_length 
-----------------+-------------------+-----------------+---------------------
               1 | Politics Magazine |              10 |                  12
               2 | Politics Magazine |              11 |                   6
               3 | Politics Magazine |              12 |                   3
               4 | Fashion Magazine  |              15 |                  12
               5 | Fashion Magazine  |              17 |                   6
               6 | Fashion Magazine  |              19 |                   3
               7 | Sports Magazine   |              11 |                  12
               8 | Sports Magazine   |              12 |                   6
               9 | Sports Magazine   |              13 |                   3
(9 rows)



Let’s follow the first Order

 order_id | customer_id | subscription_id |    purchase_date    
----------+-------------+-----------------+---------------------
        1 |           3 |               2 | 2021-05-14 14:51:53

This means that order 1 belongs to the customer with id 3, who is Lizabeth Letsche and lives in 789 Main St. Lizabeth has subscribed to subscription id 2 or Politics Magazine. 

We should go deeper in our data understanding, so let’s do it. Just pay attention to orders tables

 order_id | customer_id | subscription_id |    purchase_date    
----------+-------------+-----------------+---------------------
        1 |           3 |               2 | 2021-05-14 14:51:53
        2 |           2 |               4 | 2021-05-15 17:51:53
        3 |           3 |               4 | 2021-05-18 13:51:53
        4 |           9 |               9 | 2021-05-21 17:51:53
        5 |           7 |               5 | 2021-06-02 12:51:53
        6 |           8 |               2 | 2021-06-14 13:51:53
        7 |           5 |               8 | 2021-06-14 17:51:53
        8 |           9 |               5 | 2021-06-16 19:51:53
        9 |           4 |               4 | 2021-06-17 20:51:53
       10 |           1 |               7 | 2021-07-18 22:51:53
       11 |           5 |               4 | 2021-07-20 17:51:53
       12 |           3 |               2 | 2021-09-21 17:51:53
       13 |           3 |               5 | 2021-12-01 18:51:53
       14 |           6 |               5 | 2021-12-14 16:51:53
       15 |           1 |               2 | 2021-12-15 15:51:53
       16 |           1 |               2 | 2021-12-16 14:51:53
       17 |           3 |               6 | 2021-12-17 13:51:53
       18 |           3 |               8 | 2021-12-18 12:51:53
       19 |           4 |               9 | 2021-12-19 11:51:53
       20 |           1 |               7 | 2021-12-19 10:51:53


We see that the customer with id 3 has made 6 orders and seems that is likely our best customer (but we don’t know the customers name). On the other hand, the subscription id 2 has 5 orders and seems like it is our best-selling subscription (but we don’t know the newsletter name).

The critical thing to consider here is: If we only look at the orders table, we cannot tell what is really going on in each order, but if we refer to the other tables, we can get a complete picture. We call this "joining tables".



INNER JOIN
In SQL we combine tables with a statement called JOIN. If we want to combine orders with customers, we can do something like this:

newsletters=# SELECT * FROM orders
newsletters-# JOIN customers
newsletters-# ON orders.customer_id = customers.customer_id;


 order_id | customer_id | subscription_id |    purchase_date    | customer_id |   customer_name   |      address      
----------+-------------+-----------------+---------------------+-------------+-------------------+-------------------
        1 |           3 |               2 | 2021-05-14 14:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
        2 |           2 |               4 | 2021-05-15 17:51:53 |           2 | Jacquline Diddle  | 456 Park Ave.
        3 |           3 |               4 | 2021-05-18 13:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
        4 |           9 |               9 | 2021-05-21 17:51:53 |           9 | Eryn Vilar        | 56 Morton St.
        5 |           7 |               5 | 2021-06-02 12:51:53 |           7 | Janay Priolo      | 81 Harrisburg
        6 |           8 |               2 | 2021-06-14 13:51:53 |           8 | Ophelia Sturdnant | 31 Deerfield Ave.
        7 |           5 |               8 | 2021-06-14 17:51:53 |           5 | Inocencia Goyco   | 12 Amsterdam Ave.
        8 |           9 |               5 | 2021-06-16 19:51:53 |           9 | Eryn Vilar        | 56 Morton St.
        9 |           4 |               4 | 2021-06-17 20:51:53 |           4 | Jessia Butman     | 1 Columbus Ave.
       10 |           1 |               7 | 2021-07-18 22:51:53 |           1 | Allie Rahaim      | 123 Broadway
       11 |           5 |               4 | 2021-07-20 17:51:53 |           5 | Inocencia Goyco   | 12 Amsterdam Ave.
       12 |           3 |               2 | 2021-09-21 17:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       13 |           3 |               5 | 2021-12-01 18:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       14 |           6 |               5 | 2021-12-14 16:51:53 |           6 | Bethann Schraub   | 29 Monticello
       15 |           1 |               2 | 2021-12-15 15:51:53 |           1 | Allie Rahaim      | 123 Broadway
       16 |           1 |               2 | 2021-12-16 14:51:53 |           1 | Allie Rahaim      | 123 Broadway
       17 |           3 |               6 | 2021-12-17 13:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       18 |           3 |               8 | 2021-12-18 12:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       19 |           4 |               9 | 2021-12-19 11:51:53 |           4 | Jessia Butman     | 1 Columbus Ave.
       20 |           1 |               7 | 2021-12-19 10:51:53 |           1 | Allie Rahaim      | 123 Broadway
(20 rows)


Now we have a better picture of the data. If the customer with id 3 is our best customer, we can identify easier that the customer is: Lizabeth Letsche. Now let’s dissect the statement created 

newsletters=# SELECT * FROM orders
newsletters-# JOIN customers
newsletters-# ON orders.customer_id = customers.customer_id;

  • The first line selects all the columns (with the command SELECT *) of the combined tables. If we only want some of them, we must specify them. Then it specifies the first table on which we want to make the query (orders in our case)
  • The second line uses JOIN to say that you want to combine the information from orders with customers.
  • The third line tells how to combine the two tables. We want to match the customer_id of orders with the customer_id of customers

Because names may be repeated in multiple tables, we use the table_name.column_name syntax to ensure an unambiguous query. We can use it in ON or in SELECT this syntax. Now instead of selecting all columns (*) we will select order_id from orders and customer_name from customers.

newsletters=# SELECT orders.order_id, customers.customer_name
newsletters-# FROM orders
newsletters-# JOIN customers
newsletters-# ON orders.customer_id = customers.customer_id;
 order_id |   customer_name   
----------+-------------------
        1 | Lizabeth Letsche
        2 | Jacquline Diddle
        3 | Lizabeth Letsche
        4 | Eryn Vilar
        5 | Janay Priolo
        6 | Ophelia Sturdnant
        7 | Inocencia Goyco
        8 | Eryn Vilar
        9 | Jessia Butman
       10 | Allie Rahaim
       11 | Inocencia Goyco
       12 | Lizabeth Letsche
       13 | Lizabeth Letsche
       14 | Bethann Schraub
       15 | Allie Rahaim
       16 | Allie Rahaim
       17 | Lizabeth Letsche
       18 | Lizabeth Letsche
       19 | Jessia Butman
       20 | Allie Rahaim
(20 rows)


With this query it is very clear who made each order. 

Let's do something similar but with subscriptions

newsletters=# SELECT *
newsletters-# FROM orders
newsletters-# JOIN subscriptions
newsletters-# ON orders.subscription_id = subscriptions.subscription_id;
 order_id | customer_id | subscription_id |    purchase_date    | subscription_id |    description    | price_per_month | subscription_length 
----------+-------------+-----------------+---------------------+-----------------+-------------------+-----------------+---------------------
        1 |           3 |               2 | 2021-05-14 14:51:53 |               2 | Politics Magazine |              11 |                   6
        2 |           2 |               4 | 2021-05-15 17:51:53 |               4 | Fashion Magazine  |              15 |                  12
        3 |           3 |               4 | 2021-05-18 13:51:53 |               4 | Fashion Magazine  |              15 |                  12
        4 |           9 |               9 | 2021-05-21 17:51:53 |               9 | Sports Magazine   |              13 |                   3
        5 |           7 |               5 | 2021-06-02 12:51:53 |               5 | Fashion Magazine  |              17 |                   6
        6 |           8 |               2 | 2021-06-14 13:51:53 |               2 | Politics Magazine |              11 |                   6
        7 |           5 |               8 | 2021-06-14 17:51:53 |               8 | Sports Magazine   |              12 |                   6
        8 |           9 |               5 | 2021-06-16 19:51:53 |               5 | Fashion Magazine  |              17 |                   6
        9 |           4 |               4 | 2021-06-17 20:51:53 |               4 | Fashion Magazine  |              15 |                  12
       10 |           1 |               7 | 2021-07-18 22:51:53 |               7 | Sports Magazine   |              11 |                  12
       11 |           5 |               4 | 2021-07-20 17:51:53 |               4 | Fashion Magazine  |              15 |                  12
       12 |           3 |               2 | 2021-09-21 17:51:53 |               2 | Politics Magazine |              11 |                   6
       13 |           3 |               5 | 2021-12-01 18:51:53 |               5 | Fashion Magazine  |              17 |                   6
       14 |           6 |               5 | 2021-12-14 16:51:53 |               5 | Fashion Magazine  |              17 |                   6
       15 |           1 |               2 | 2021-12-15 15:51:53 |               2 | Politics Magazine |              11 |                   6
       16 |           1 |               2 | 2021-12-16 14:51:53 |               2 | Politics Magazine |              11 |                   6
       17 |           3 |               6 | 2021-12-17 13:51:53 |               6 | Fashion Magazine  |              19 |                   3
       18 |           3 |               8 | 2021-12-18 12:51:53 |               8 | Sports Magazine   |              12 |                   6
       19 |           4 |               9 | 2021-12-19 11:51:53 |               9 | Sports Magazine   |              13 |                   3
       20 |           1 |               7 | 2021-12-19 10:51:53 |               7 | Sports Magazine   |              11 |                  12
(20 rows)





Now we have a cleared map about the two combined or joined tables. 

So far we have been using the keyword JOIN to do this operation. We could call it INNER JOIN and have the exact same result. In SQL Inner Joins are the most common operation, but they have some limits. For instance: when we run a simple JOIN our result only includes rows that match our ON condition.

If we analyze in detail our customers and orders table, we can see that the customer with id 10 that belongs to Jina Farraj, doesn't have any Order. The question is: Was she included on the Inner Join table? No, because she doesn't have any Order, so he will be excluded from the joining table until she makes an order. 

Note: let’s keep an eye on customer 10 Jina Farraj and with that we can understand the different Joins

LEFT JOINS
What if we want to combine two tables and keep some rows that do not match? We can do that with a LEFT JOIN. A left join will keep all the rows of the first table, no matter if they don't match the second table. Let’s do an example


newsletters=# SELECT *
newsletters-# FROM customers
newsletters-# LEFT JOIN orders
newsletters-# ON customers.customer_id = orders.customer_id;
 customer_id |   customer_name   |      address      | order_id | customer_id | subscription_id |    purchase_date    
-------------+-------------------+-------------------+----------+-------------+-----------------+---------------------
           3 | Lizabeth Letsche  | 789 Main St.      |        1 |           3 |               2 | 2021-05-14 14:51:53
           2 | Jacquline Diddle  | 456 Park Ave.     |        2 |           2 |               4 | 2021-05-15 17:51:53
           3 | Lizabeth Letsche  | 789 Main St.      |        3 |           3 |               4 | 2021-05-18 13:51:53
           9 | Eryn Vilar        | 56 Morton St.     |        4 |           9 |               9 | 2021-05-21 17:51:53
           7 | Janay Priolo      | 81 Harrisburg     |        5 |           7 |               5 | 2021-06-02 12:51:53
           8 | Ophelia Sturdnant | 31 Deerfield Ave. |        6 |           8 |               2 | 2021-06-14 13:51:53
           5 | Inocencia Goyco   | 12 Amsterdam Ave. |        7 |           5 |               8 | 2021-06-14 17:51:53
           9 | Eryn Vilar        | 56 Morton St.     |        8 |           9 |               5 | 2021-06-16 19:51:53
           4 | Jessia Butman     | 1 Columbus Ave.   |        9 |           4 |               4 | 2021-06-17 20:51:53
           1 | Allie Rahaim      | 123 Broadway      |       10 |           1 |               7 | 2021-07-18 22:51:53
           5 | Inocencia Goyco   | 12 Amsterdam Ave. |       11 |           5 |               4 | 2021-07-20 17:51:53
           3 | Lizabeth Letsche  | 789 Main St.      |       12 |           3 |               2 | 2021-09-21 17:51:53
           3 | Lizabeth Letsche  | 789 Main St.      |       13 |           3 |               5 | 2021-12-01 18:51:53
           6 | Bethann Schraub   | 29 Monticello     |       14 |           6 |               5 | 2021-12-14 16:51:53
           1 | Allie Rahaim      | 123 Broadway      |       15 |           1 |               2 | 2021-12-15 15:51:53
           1 | Allie Rahaim      | 123 Broadway      |       16 |           1 |               2 | 2021-12-16 14:51:53
           3 | Lizabeth Letsche  | 789 Main St.      |       17 |           3 |               6 | 2021-12-17 13:51:53
           3 | Lizabeth Letsche  | 789 Main St.      |       18 |           3 |               8 | 2021-12-18 12:51:53
           4 | Jessia Butman     | 1 Columbus Ave.   |       19 |           4 |               9 | 2021-12-19 11:51:53
           1 | Allie Rahaim      | 123 Broadway      |       20 |           1 |               7 | 2021-12-19 10:51:53
          10 | Jina Farraj       | 100 Bryan Ave.    |          |             |                 | 
(21 rows)


Can you see the difference?

The customer with ID 10 now appears in the result but the data related with the order is empty or null. That is because the LEFT JOIN keeps all the rows of the first table, no matter if they don't match the second table. Now let’s explain the query executed

newsletters=# SELECT *
newsletters-# FROM customers
newsletters-# LEFT JOIN orders
newsletters-# ON customers.customer_id = orders.customer_id;

  • The first line selects all columns of both tables.
  • The second line selects customers (the left table).
  • The third line runs a LEFT JOIN on table orders (the right table).
  • The fourth line tells SQL how to run the join (looking for the matching values in column customer_id)



When should we use INNER JOIN instead of LEFT JOIN?
  • It depends on how we want to select the combined data.
  • Generally we use INNER JOIN when we want to select only the rows that match the ON condition.
  • We use LEFT JOIN when we want every row of the first table, no matter if it matches the second one.

RIGHT JOIN
It works very similarly to LEFT JOIN. The main difference is the position of the tables. If we replicate last exercise to see the customer with id 10, we can do something like this:


newsletters=# SELECT *
newsletters-# FROM orders
newsletters-# RIGHT JOIN customers
newsletters-# ON orders.customer_id = customers.customer_id;
 order_id | customer_id | subscription_id |    purchase_date    | customer_id |   customer_name   |      address      
----------+-------------+-----------------+---------------------+-------------+-------------------+-------------------
        1 |           3 |               2 | 2021-05-14 14:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
        2 |           2 |               4 | 2021-05-15 17:51:53 |           2 | Jacquline Diddle  | 456 Park Ave.
        3 |           3 |               4 | 2021-05-18 13:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
        4 |           9 |               9 | 2021-05-21 17:51:53 |           9 | Eryn Vilar        | 56 Morton St.
        5 |           7 |               5 | 2021-06-02 12:51:53 |           7 | Janay Priolo      | 81 Harrisburg
        6 |           8 |               2 | 2021-06-14 13:51:53 |           8 | Ophelia Sturdnant | 31 Deerfield Ave.
        7 |           5 |               8 | 2021-06-14 17:51:53 |           5 | Inocencia Goyco   | 12 Amsterdam Ave.
        8 |           9 |               5 | 2021-06-16 19:51:53 |           9 | Eryn Vilar        | 56 Morton St.
        9 |           4 |               4 | 2021-06-17 20:51:53 |           4 | Jessia Butman     | 1 Columbus Ave.
       10 |           1 |               7 | 2021-07-18 22:51:53 |           1 | Allie Rahaim      | 123 Broadway
       11 |           5 |               4 | 2021-07-20 17:51:53 |           5 | Inocencia Goyco   | 12 Amsterdam Ave.
       12 |           3 |               2 | 2021-09-21 17:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       13 |           3 |               5 | 2021-12-01 18:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       14 |           6 |               5 | 2021-12-14 16:51:53 |           6 | Bethann Schraub   | 29 Monticello
       15 |           1 |               2 | 2021-12-15 15:51:53 |           1 | Allie Rahaim      | 123 Broadway
       16 |           1 |               2 | 2021-12-16 14:51:53 |           1 | Allie Rahaim      | 123 Broadway
       17 |           3 |               6 | 2021-12-17 13:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       18 |           3 |               8 | 2021-12-18 12:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       19 |           4 |               9 | 2021-12-19 11:51:53 |           4 | Jessia Butman     | 1 Columbus Ave.
       20 |           1 |               7 | 2021-12-19 10:51:53 |           1 | Allie Rahaim      | 123 Broadway
          |             |                 |                     |          10 | Jina Farraj       | 100 Bryan Ave.
(21 rows)




If you pay attention to the last row, the empty values are in the first column (that belongs to the orders) and then we have the customer info. That is because the RIGHT JOIN keeps all the rows of the second table, no matter if they don't match the first table.


With this we have the most common usage of Join Tables and you can test by yourself the different results with different data

I hope you learned a lot

Thanks for reading
Daniel Morales