Questions
What is the difference between `UNION` and `UNION ALL` in PostgreSQL?
The Scenario
You are a data analyst at an e-commerce company. You are trying to write a query to get a list of all the customers from two different tables: customers_a and customers_b.
You are not sure whether to use UNION or UNION ALL to combine the results of the two queries.
The Challenge
Explain the difference between UNION and UNION ALL in PostgreSQL. What are the pros and cons of each approach, and which one would you choose for this use case?
A junior engineer might think that they are interchangeable. They might not be aware of the difference in performance or the fact that `UNION` removes duplicate rows.
A senior engineer would be able to provide a detailed explanation of the differences between `UNION` and `UNION ALL`. They would also be able to explain the trade-offs between each approach and would have a clear recommendation for which one to use in this use case.
Step 1: Understand the Key Differences
| Feature | UNION | UNION ALL |
|---|---|---|
| Duplicates | Removes duplicate rows from the result set. | Includes all duplicate rows in the result set. |
| Performance | Slower than UNION ALL, because it has to sort the result set to remove the duplicates. | Faster than UNION, because it does not have to remove duplicates. |
| Use Cases | When you want to combine the results of two queries and remove any duplicate rows. | When you want to combine the results of two queries and keep all the duplicate rows. |
Step 2: Choose the Right Tool for the Job
For our use case, we want to get a list of all the customers from two different tables. If there is a possibility that the same customer could be in both tables, and we want to see the customer only once in the result set, we should use UNION.
If we want to see all the customers from both tables, even if there are duplicates, we should use UNION ALL.
Step 3: Code Examples
Here are some code examples that show the difference between the two approaches:
UNION:
SELECT customer_name FROM customers_a
UNION
SELECT customer_name FROM customers_b;UNION ALL:
SELECT customer_name FROM customers_a
UNION ALL
SELECT customer_name FROM customers_b; Practice Question
You want to get a list of all the unique cities from two different tables: `customers` and `suppliers`. Which of the following would you use?