Intro - What is Data Aggregation
Introduction
Welcome to the Window Functions module! Before we dive into window functions, we need to understand the foundation they build upon: data aggregation.
In this lesson, you will learn:
- What data aggregation means in SQL
- Why aggregation is essential for data analysis
- The limitations of traditional aggregation
- How window functions solve these limitations
What is Data Aggregation?
Data aggregation is the process of combining multiple rows of data into a single summary value. Instead of looking at individual records, you summarize them to find patterns, trends, and insights.
Real-World Analogy
Imagine you have a basket of fruits:
- 5 apples
- 3 oranges
- 2 bananas
Instead of listing each fruit individually, you might say:
- Total count: 10 fruits
- Average per type: 3.33 fruits
- Most common: Apples (5)
This summarization IS aggregation!
In SQL Terms
Data aggregation answers questions like:
- How many orders did we receive last month?
- What is the average order value per customer?
- What was the total revenue per product category?
- Which city has the most customers?

Why is Aggregation Important?
Raw data can contain millions of rows. Aggregation helps us:
1. Reduce Data Volume
Instead of viewing 100,000 order records, you can see:
- 12 monthly totals
- 50 product category summaries
- 27 state-level breakdowns
2. Find Patterns
Aggregated data reveals trends that individual rows hide:
- Sales increasing 15% month-over-month
- Weekends have 40% more orders
- São Paulo customers spend 25% more on average
3. Support Decision Making
Business decisions rely on aggregated metrics:
- Marketing: Which campaigns bring the most customers?
- Operations: Which warehouses handle the most volume?
- Finance: What is the average payment value per method?
4. Enable Comparisons
You can compare:
- This month vs last month
- Region A vs Region B
- Product X vs Product Y
Traditional Aggregation with GROUP BY
The traditional way to aggregate data in SQL is using GROUP BY. It collapses multiple rows into summary rows.
Example: Count Customers Per State
SELECT
customer_state,
COUNT(*) AS customer_count
FROM olist_customers_dataset
GROUP BY customer_state
ORDER BY customer_count DESC;How GROUP BY Works
Before Aggregation (Individual Rows):
| customer_id | customer_state |
|---|---|
| C001 | SP |
| C002 | SP |
| C003 | RJ |
| C004 | SP |
| C005 | RJ |
After Aggregation (Grouped Rows):
| customer_state | customer_count |
|---|---|
| SP | 3 |
| RJ | 2 |
Notice how the 5 individual rows became 2 summary rows. The individual customer IDs are gone—they were collapsed into counts.
The Limitation of GROUP BY
GROUP BY is powerful, but it has a major limitation: you lose the individual row details.
The Problem
What if you need both:
- The individual row data (each order's details)
- AND the aggregated value (total orders for that customer)
Example Scenario
You want to see each order along with how many total orders that customer has made:
| order_id | customer_id | order_date | total_customer_orders |
|---|---|---|---|
| O001 | C001 | 2023-01-15 | 3 |
| O002 | C001 | 2023-02-20 | 3 |
| O003 | C001 | 2023-03-10 | 3 |
| O004 | C002 | 2023-01-18 | 2 |
| O005 | C002 | 2023-04-05 | 2 |
With GROUP BY, you cannot achieve this! You either get:
- Individual orders (no aggregate)
- Aggregated customer counts (no individual orders)
But not both at the same time!
Enter Window Functions
Window functions solve this limitation. They perform calculations across a set of rows (a "window") while keeping all the individual rows intact.
The Key Difference
| Approach | Rows Preserved? | Aggregate Available? |
|---|---|---|
| GROUP BY | ❌ Rows collapsed | ✅ Yes |
| Window Functions | ✅ All rows kept | ✅ Yes |
Preview: Window Function Syntax
SELECT
order_id,
customer_id,
order_purchase_timestamp,
COUNT(*) OVER (PARTITION BY customer_id) AS total_customer_orders
FROM olist_orders_dataset
LIMIT 10;Notice the OVER keyword—this is what makes it a window function instead of a regular aggregate. The query returns every row, but each row also shows the count of orders for that customer.
Types of Aggregation Operations
SQL provides several types of aggregation operations:
1. Counting
- How many rows/records?
COUNT(*)
2. Summing
- What is the total value?
SUM(column)
3. Averaging
- What is the mean value?
AVG(column)
4. Finding Extremes
- What is the minimum/maximum?
MIN(column),MAX(column)
5. Ranking
- What is the position/rank?
ROW_NUMBER(),RANK(),DENSE_RANK()
6. Distribution Analysis
- What percentile is this value?
NTILE(),PERCENT_RANK(),CUME_DIST()
All of these can be used with either GROUP BY (traditional) or Window Functions (modern approach).
Practice
Let's explore the difference between traditional aggregation and seeing individual rows. Run the queries below to understand why window functions are needed.
Exercise 1: Traditional GROUP BY Aggregation
This query counts orders per status using GROUP BY. Notice that you only see summary rows, not individual orders:
-- Traditional aggregation with GROUP BY
-- Individual order details are lost
SELECT
order_status,
COUNT(*) AS order_count
FROM olist_orders_dataset
GROUP BY order_status
ORDER BY order_count DESC;Exercise 2: Window Function Aggregation
This query uses a window function to show each order AND its status count. Notice that all individual rows are preserved:
-- Window function aggregation
-- Individual orders ARE preserved!
SELECT
order_id,
order_status,
order_purchase_timestamp,
COUNT(*) OVER (PARTITION BY order_status) AS status_count
FROM olist_orders_dataset
LIMIT 20;Summary
Key Takeaways
- Data aggregation combines multiple rows into summary values
- GROUP BY is the traditional approach—it collapses rows
- Window functions perform aggregation while preserving all rows
- The
OVERkeyword distinguishes window functions from regular aggregates
When to Use Each
| Use Case | Approach |
|---|---|
| Need only summary data | GROUP BY |
| Need individual rows + aggregates | Window Functions |
| Creating reports with totals | GROUP BY |
| Calculating running totals | Window Functions |
| Simple counts/sums | Either works |
| Comparing each row to group total | Window Functions |
Coming Up Next
In the next lesson, we'll review the aggregate functions (COUNT, SUM, AVG, MIN, MAX) that form the foundation of both GROUP BY and window function operations.