Skip to main content

Intro - What is Data Aggregation

Introduction

Welcome to the Window Functions module! Before we dive into window functions, we need to understand the foundation they build upon: data aggregation.

In this lesson, you will learn:

  • What data aggregation means in SQL
  • Why aggregation is essential for data analysis
  • The limitations of traditional aggregation
  • How window functions solve these limitations

What is Data Aggregation?

Data aggregation is the process of combining multiple rows of data into a single summary value. Instead of looking at individual records, you summarize them to find patterns, trends, and insights.

Real-World Analogy

Imagine you have a basket of fruits:

  • 5 apples
  • 3 oranges
  • 2 bananas

Instead of listing each fruit individually, you might say:

  • Total count: 10 fruits
  • Average per type: 3.33 fruits
  • Most common: Apples (5)

This summarization IS aggregation!

In SQL Terms

Data aggregation answers questions like:

  • How many orders did we receive last month?
  • What is the average order value per customer?
  • What was the total revenue per product category?
  • Which city has the most customers?
Data aggregation concept: individual rows collapsed into summary values using SUM, COUNT, AVG, MIN, MAX
Data aggregation concept: individual rows collapsed into summary values using SUM, COUNT, AVG, MIN, MAX

Why is Aggregation Important?

Raw data can contain millions of rows. Aggregation helps us:

1. Reduce Data Volume

Instead of viewing 100,000 order records, you can see:

  • 12 monthly totals
  • 50 product category summaries
  • 27 state-level breakdowns

2. Find Patterns

Aggregated data reveals trends that individual rows hide:

  • Sales increasing 15% month-over-month
  • Weekends have 40% more orders
  • São Paulo customers spend 25% more on average

3. Support Decision Making

Business decisions rely on aggregated metrics:

  • Marketing: Which campaigns bring the most customers?
  • Operations: Which warehouses handle the most volume?
  • Finance: What is the average payment value per method?

4. Enable Comparisons

You can compare:

  • This month vs last month
  • Region A vs Region B
  • Product X vs Product Y

Traditional Aggregation with GROUP BY

The traditional way to aggregate data in SQL is using GROUP BY. It collapses multiple rows into summary rows.

Example: Count Customers Per State

SELECT 
    customer_state,
    COUNT(*) AS customer_count
FROM olist_customers_dataset
GROUP BY customer_state
ORDER BY customer_count DESC;

How GROUP BY Works

Before Aggregation (Individual Rows):

customer_idcustomer_state
C001SP
C002SP
C003RJ
C004SP
C005RJ

After Aggregation (Grouped Rows):

customer_statecustomer_count
SP3
RJ2

Notice how the 5 individual rows became 2 summary rows. The individual customer IDs are gone—they were collapsed into counts.

The Limitation of GROUP BY

GROUP BY is powerful, but it has a major limitation: you lose the individual row details.

The Problem

What if you need both:

  • The individual row data (each order's details)
  • AND the aggregated value (total orders for that customer)

Example Scenario

You want to see each order along with how many total orders that customer has made:

order_idcustomer_idorder_datetotal_customer_orders
O001C0012023-01-153
O002C0012023-02-203
O003C0012023-03-103
O004C0022023-01-182
O005C0022023-04-052

With GROUP BY, you cannot achieve this! You either get:

  • Individual orders (no aggregate)
  • Aggregated customer counts (no individual orders)

But not both at the same time!

Enter Window Functions

Window functions solve this limitation. They perform calculations across a set of rows (a "window") while keeping all the individual rows intact.

The Key Difference

ApproachRows Preserved?Aggregate Available?
GROUP BY❌ Rows collapsed✅ Yes
Window Functions✅ All rows kept✅ Yes

Preview: Window Function Syntax

SELECT 
    order_id,
    customer_id,
    order_purchase_timestamp,
    COUNT(*) OVER (PARTITION BY customer_id) AS total_customer_orders
FROM olist_orders_dataset
LIMIT 10;

Notice the OVER keyword—this is what makes it a window function instead of a regular aggregate. The query returns every row, but each row also shows the count of orders for that customer.

Types of Aggregation Operations

SQL provides several types of aggregation operations:

1. Counting

  • How many rows/records?
  • COUNT(*)

2. Summing

  • What is the total value?
  • SUM(column)

3. Averaging

  • What is the mean value?
  • AVG(column)

4. Finding Extremes

  • What is the minimum/maximum?
  • MIN(column), MAX(column)

5. Ranking

  • What is the position/rank?
  • ROW_NUMBER(), RANK(), DENSE_RANK()

6. Distribution Analysis

  • What percentile is this value?
  • NTILE(), PERCENT_RANK(), CUME_DIST()

All of these can be used with either GROUP BY (traditional) or Window Functions (modern approach).

Practice

Let's explore the difference between traditional aggregation and seeing individual rows. Run the queries below to understand why window functions are needed.

Exercise 1: Traditional GROUP BY Aggregation

This query counts orders per status using GROUP BY. Notice that you only see summary rows, not individual orders:

-- Traditional aggregation with GROUP BY
-- Individual order details are lost

SELECT 
    order_status,
    COUNT(*) AS order_count
FROM olist_orders_dataset
GROUP BY order_status
ORDER BY order_count DESC;

Exercise 2: Window Function Aggregation

This query uses a window function to show each order AND its status count. Notice that all individual rows are preserved:

-- Window function aggregation
-- Individual orders ARE preserved!

SELECT 
    order_id,
    order_status,
    order_purchase_timestamp,
    COUNT(*) OVER (PARTITION BY order_status) AS status_count
FROM olist_orders_dataset
LIMIT 20;

Summary

Key Takeaways

  1. Data aggregation combines multiple rows into summary values
  2. GROUP BY is the traditional approach—it collapses rows
  3. Window functions perform aggregation while preserving all rows
  4. The OVER keyword distinguishes window functions from regular aggregates

When to Use Each

Use CaseApproach
Need only summary dataGROUP BY
Need individual rows + aggregatesWindow Functions
Creating reports with totalsGROUP BY
Calculating running totalsWindow Functions
Simple counts/sumsEither works
Comparing each row to group totalWindow Functions

Coming Up Next

In the next lesson, we'll review the aggregate functions (COUNT, SUM, AVG, MIN, MAX) that form the foundation of both GROUP BY and window function operations.