Skip to main content

Intro - What is Data Combining

Introduction

In the real world, data is rarely stored in a single table. Instead, data is distributed across multiple tables to avoid redundancy, improve organization, and maintain data integrity.

Data combining is the process of merging data from two or more tables into a single result set.

In this tutorial, you will learn:

  • Why data is split across multiple tables
  • What data combining means in SQL
  • When you need to combine data
  • The concept of relationships between tables

Why Is Data Split Across Tables?

Consider an e-commerce system. You could store everything in one giant table:

order_idcustomer_namecustomer_cityproduct_namepriceseller_nameseller_city
001JohnSão PauloLaptop1000TechStoreRio
002JohnSão PauloMouse25TechStoreRio
003MariaRioLaptop1000TechStoreRio

Problems with a Single Table

  1. Data Redundancy: John's information is repeated in rows 1 and 2. TechStore's information is repeated in all rows.

  2. Update Anomalies: If John moves to a new city, you need to update every row where he appears.

  3. Insert Anomalies: You cannot add a new customer without them placing an order.

  4. Storage Waste: Repeating the same data wastes storage space.

The Solution: Multiple Related Tables

Instead, we split data into separate tables:

customers table:

customer_idcustomer_namecustomer_city
C1JohnSão Paulo
C2MariaRio

products table:

product_idproduct_nameprice
P1Laptop1000
P2Mouse25

orders table:

order_idcustomer_idproduct_id
001C1P1
002C1P2
003C2P1

Now each piece of information exists in only one place!

What is Data Combining?

Data combining is the process of reconstructing related data from multiple tables back into a unified view.

When you want to see "which customer ordered which product," you need to combine:

  • The customers table (for customer details)
  • The orders table (for order records)
  • The products table (for product details)

The Goal

From three separate tables, create a result like:

customer_nameproduct_nameorder_date
JohnLaptop2024-01-15
JohnMouse2024-01-16
MariaLaptop2024-01-17

This is exactly what SQL JOINs accomplish.

Overview of data combining methods: JOINs (horizontal combining via keys) vs SET operators (vertical stacking via UNION)
Overview of data combining methods: JOINs (horizontal combining via keys) vs SET operators (vertical stacking via UNION)

Keys: The Bridge Between Tables

Tables are connected through keys - columns that establish relationships.

Primary Key

A primary key uniquely identifies each row in a table.

customers table
┌─────────────┬───────────────┬───────────────┐
│ customer_id │ customer_name │ customer_city │
│ (PK)        │               │               │
├─────────────┼───────────────┼───────────────┤
│ C1          │ John          │ São Paulo     │
│ C2          │ Maria         │ Rio           │
└─────────────┴───────────────┴───────────────┘

Foreign Key

A foreign key references a primary key in another table, creating a relationship.

orders table
┌──────────┬─────────────┬────────────┐
│ order_id │ customer_id │ product_id │
│ (PK)     │ (FK)        │ (FK)       │
├──────────┼─────────────┼────────────┤
│ 001      │ C1          │ P1         │
│ 002      │ C1          │ P2         │
│ 003      │ C2          │ P1         │
└──────────┴─────────────┴────────────┘

The customer_id in orders references customer_id in customers. This is how we know which customer placed which order.

Types of Table Relationships

Tables can be related in different ways:

One-to-One (1:1)

One row in Table A relates to exactly one row in Table B.

Example: Each customer has one profile. Each profile belongs to one customer.

customers ──────── profiles
   1         1

One-to-Many (1:N)

One row in Table A relates to many rows in Table B.

Example: One customer can place many orders. Each order belongs to one customer.

customers ──────── orders
   1         N

Many-to-Many (M:N)

Many rows in Table A relate to many rows in Table B (through a junction table).

Example: Many products can be in many orders. Solved with an order_items table.

orders ──────── order_items ──────── products
   N         N            N         N

Our E-Commerce Database Relationships

In our Brazilian E-Commerce dataset, tables are connected like this:

Key Relationships in Our Data

From TableTo TableRelationshipJoin Column
customersordersOne-to-Manycustomer_id
ordersorder_itemsOne-to-Manyorder_id
productsorder_itemsOne-to-Manyproduct_id
sellersorder_itemsOne-to-Manyseller_id
ordersorder_paymentsOne-to-Manyorder_id
er-diagram relationship
er-diagram relationship

When Do You Need to Combine Data?

You need to combine data whenever your question involves information from multiple tables.

Examples

Question: "What city are my top customers from?"

  • Need: Customer cities (customers table) + Order counts (orders table)
  • Combination required!

Question: "Which products generated the most revenue?"

  • Need: Product names (products table) + Order values (order_items table)
  • Combination required!

Question: "How many customers are in each state?"

  • Need: Only customer data (customers table)
  • No combination needed - single table query

Rule of Thumb

If your question involves:

  • Columns from different tables → You need JOINs
  • Columns from the same table → Simple SELECT is enough

Preview: How SQL Combines Data

SQL provides the JOIN clause to combine tables. Here's a sneak peek:

SELECT 
    c.customer_city,
    o.order_id
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;

What This Does

  1. Takes all rows from the customers table
  2. Matches them with rows from the orders table
  3. Uses customer_id to find matching rows
  4. Returns combined results

We'll explore different types of JOINs in the upcoming tutorials:

  • INNER JOIN: Only matching rows from both tables
  • LEFT JOIN: All rows from left table + matches
  • RIGHT JOIN: All rows from right table + matches
  • FULL JOIN: All rows from both tables
  • And more!

Practice

Before learning JOINs, let's explore the tables we'll be combining.

Exercise 1: Explore the Customers Table

Run this query to see the structure of the customers table:

-- View sample customer data

SELECT 
    customer_id,
    customer_city,
    customer_state
FROM olist_customers_dataset
LIMIT 5;

Exercise 2: Explore the Orders Table

Now look at the orders table - notice it has customer_id that connects to customers:

-- View sample order data
-- Notice the customer_id column that links to customers table

SELECT 
    order_id,
    customer_id,
    order_status
FROM olist_orders_dataset
LIMIT 5;

Exercise 3: Explore the Order Items Table

The order_items table connects orders to products and sellers:

-- View sample order items
-- Notice multiple foreign keys: order_id, product_id, seller_id

SELECT 
    order_id,
    product_id,
    seller_id,
    price
FROM olist_order_items_dataset
LIMIT 5;

Summary

Key Takeaways

  1. Data is distributed across multiple tables to avoid redundancy and maintain integrity

  2. Data combining merges related data from multiple tables into one result

  3. Primary keys uniquely identify rows in a table

  4. Foreign keys create relationships between tables by referencing primary keys

  5. Table relationships can be:

  • One-to-One (1:1)
  • One-to-Many (1:N)
  • Many-to-Many (M:N)
  1. You need JOINs when your query requires data from multiple tables

What's Next?

In the next tutorial, we'll introduce SQL JOINs - the primary mechanism for combining data from multiple tables.