Intro - What is Data Combining
Introduction
In the real world, data is rarely stored in a single table. Instead, data is distributed across multiple tables to avoid redundancy, improve organization, and maintain data integrity.
Data combining is the process of merging data from two or more tables into a single result set.
In this tutorial, you will learn:
- Why data is split across multiple tables
- What data combining means in SQL
- When you need to combine data
- The concept of relationships between tables
Why Is Data Split Across Tables?
Consider an e-commerce system. You could store everything in one giant table:
| order_id | customer_name | customer_city | product_name | price | seller_name | seller_city |
|---|---|---|---|---|---|---|
| 001 | John | São Paulo | Laptop | 1000 | TechStore | Rio |
| 002 | John | São Paulo | Mouse | 25 | TechStore | Rio |
| 003 | Maria | Rio | Laptop | 1000 | TechStore | Rio |
Problems with a Single Table
-
Data Redundancy: John's information is repeated in rows 1 and 2. TechStore's information is repeated in all rows.
-
Update Anomalies: If John moves to a new city, you need to update every row where he appears.
-
Insert Anomalies: You cannot add a new customer without them placing an order.
-
Storage Waste: Repeating the same data wastes storage space.
The Solution: Multiple Related Tables
Instead, we split data into separate tables:
customers table:
| customer_id | customer_name | customer_city |
|---|---|---|
| C1 | John | São Paulo |
| C2 | Maria | Rio |
products table:
| product_id | product_name | price |
|---|---|---|
| P1 | Laptop | 1000 |
| P2 | Mouse | 25 |
orders table:
| order_id | customer_id | product_id |
|---|---|---|
| 001 | C1 | P1 |
| 002 | C1 | P2 |
| 003 | C2 | P1 |
Now each piece of information exists in only one place!
What is Data Combining?
Data combining is the process of reconstructing related data from multiple tables back into a unified view.
When you want to see "which customer ordered which product," you need to combine:
- The customers table (for customer details)
- The orders table (for order records)
- The products table (for product details)
The Goal
From three separate tables, create a result like:
| customer_name | product_name | order_date |
|---|---|---|
| John | Laptop | 2024-01-15 |
| John | Mouse | 2024-01-16 |
| Maria | Laptop | 2024-01-17 |
This is exactly what SQL JOINs accomplish.

Keys: The Bridge Between Tables
Tables are connected through keys - columns that establish relationships.
Primary Key
A primary key uniquely identifies each row in a table.
customers table
┌─────────────┬───────────────┬───────────────┐
│ customer_id │ customer_name │ customer_city │
│ (PK) │ │ │
├─────────────┼───────────────┼───────────────┤
│ C1 │ John │ São Paulo │
│ C2 │ Maria │ Rio │
└─────────────┴───────────────┴───────────────┘
Foreign Key
A foreign key references a primary key in another table, creating a relationship.
orders table
┌──────────┬─────────────┬────────────┐
│ order_id │ customer_id │ product_id │
│ (PK) │ (FK) │ (FK) │
├──────────┼─────────────┼────────────┤
│ 001 │ C1 │ P1 │
│ 002 │ C1 │ P2 │
│ 003 │ C2 │ P1 │
└──────────┴─────────────┴────────────┘
The customer_id in orders references customer_id in customers. This is how we know which customer placed which order.
Types of Table Relationships
Tables can be related in different ways:
One-to-One (1:1)
One row in Table A relates to exactly one row in Table B.
Example: Each customer has one profile. Each profile belongs to one customer.
customers ──────── profiles
1 1
One-to-Many (1:N)
One row in Table A relates to many rows in Table B.
Example: One customer can place many orders. Each order belongs to one customer.
customers ──────── orders
1 N
Many-to-Many (M:N)
Many rows in Table A relate to many rows in Table B (through a junction table).
Example: Many products can be in many orders. Solved with an order_items table.
orders ──────── order_items ──────── products
N N N N
Our E-Commerce Database Relationships
In our Brazilian E-Commerce dataset, tables are connected like this:
Key Relationships in Our Data
| From Table | To Table | Relationship | Join Column |
|---|---|---|---|
| customers | orders | One-to-Many | customer_id |
| orders | order_items | One-to-Many | order_id |
| products | order_items | One-to-Many | product_id |
| sellers | order_items | One-to-Many | seller_id |
| orders | order_payments | One-to-Many | order_id |

When Do You Need to Combine Data?
You need to combine data whenever your question involves information from multiple tables.
Examples
Question: "What city are my top customers from?"
- Need: Customer cities (customers table) + Order counts (orders table)
- Combination required!
Question: "Which products generated the most revenue?"
- Need: Product names (products table) + Order values (order_items table)
- Combination required!
Question: "How many customers are in each state?"
- Need: Only customer data (customers table)
- No combination needed - single table query
Rule of Thumb
If your question involves:
- Columns from different tables → You need JOINs
- Columns from the same table → Simple SELECT is enough
Preview: How SQL Combines Data
SQL provides the JOIN clause to combine tables. Here's a sneak peek:
SELECT
c.customer_city,
o.order_id
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
What This Does
- Takes all rows from the
customerstable - Matches them with rows from the
orderstable - Uses
customer_idto find matching rows - Returns combined results
We'll explore different types of JOINs in the upcoming tutorials:
- INNER JOIN: Only matching rows from both tables
- LEFT JOIN: All rows from left table + matches
- RIGHT JOIN: All rows from right table + matches
- FULL JOIN: All rows from both tables
- And more!
Practice
Before learning JOINs, let's explore the tables we'll be combining.
Exercise 1: Explore the Customers Table
Run this query to see the structure of the customers table:
-- View sample customer data
SELECT
customer_id,
customer_city,
customer_state
FROM olist_customers_dataset
LIMIT 5;Exercise 2: Explore the Orders Table
Now look at the orders table - notice it has customer_id that connects to customers:
-- View sample order data
-- Notice the customer_id column that links to customers table
SELECT
order_id,
customer_id,
order_status
FROM olist_orders_dataset
LIMIT 5;Exercise 3: Explore the Order Items Table
The order_items table connects orders to products and sellers:
-- View sample order items
-- Notice multiple foreign keys: order_id, product_id, seller_id
SELECT
order_id,
product_id,
seller_id,
price
FROM olist_order_items_dataset
LIMIT 5;Summary
Key Takeaways
-
Data is distributed across multiple tables to avoid redundancy and maintain integrity
-
Data combining merges related data from multiple tables into one result
-
Primary keys uniquely identify rows in a table
-
Foreign keys create relationships between tables by referencing primary keys
-
Table relationships can be:
- One-to-One (1:1)
- One-to-Many (1:N)
- Many-to-Many (M:N)
- You need JOINs when your query requires data from multiple tables
What's Next?
In the next tutorial, we'll introduce SQL JOINs - the primary mechanism for combining data from multiple tables.