Skip to main content

Intro - What is Data Filtering

Introduction

Imagine you have a massive spreadsheet with thousands of rows of customer orders. You don't want to see every single row - you want to find specific information, like orders from a particular city or purchases above a certain amount.

Data filtering is the process of narrowing down your results to show only the rows that meet specific criteria. It's one of the most fundamental and frequently used operations in SQL.

In this lesson, you will learn:

  • What data filtering means and why it's essential
  • How the WHERE clause enables filtering
  • Real-world scenarios where filtering is critical
  • Overview of filtering operators available in SQL

What is Data Filtering?

Data filtering is the technique of selecting specific rows from a database table based on conditions you define. Instead of retrieving all data, you retrieve only the data that matters for your analysis or application.

Analogy: The Coffee Filter

Think of data filtering like a coffee filter:

  • All the data = Coffee grounds + water mixture
  • The filter condition = The filter paper
  • Filtered results = Clean coffee in your cup

Just as a coffee filter separates what you want (coffee) from what you don't (grounds), SQL filtering separates relevant rows from irrelevant ones.

Without Filtering vs With Filtering

Without filtering:

SELECT * FROM olist_orders_dataset;

→ Returns ALL orders (potentially hundreds of thousands of rows)

With filtering:

SELECT * FROM olist_orders_dataset
WHERE order_status = 'delivered';

→ Returns only delivered orders

Funnel diagram showing how SQL WHERE clause filters rows from a full table down to matching results
Funnel diagram showing how SQL WHERE clause filters rows from a full table down to matching results

Why is Data Filtering Important?

Filtering data is crucial for several reasons:

1. Performance

Databases can contain millions or billions of rows. Retrieving all of them is:

  • Slow - Takes a long time to execute
  • Resource-intensive - Uses more memory and bandwidth
  • Impractical - No one can analyze millions of rows manually

2. Relevance

Business questions are usually specific:

  • "How many orders were placed last month?"
  • "Which customers are from São Paulo?"
  • "What products cost more than $100?"

Filtering gives you precisely the data needed to answer these questions.

3. Security and Privacy

Sometimes you should only show certain data:

  • A manager sees only their department's data
  • Users see only their own orders
  • Sensitive information is excluded from reports

4. Data Quality

Filtering helps exclude invalid or unwanted data:

  • Exclude canceled orders from sales reports
  • Remove test data from production analysis
  • Focus on specific date ranges for trends

The WHERE Clause - Your Filtering Tool

In SQL, the WHERE clause is the primary mechanism for filtering data. It specifies conditions that rows must meet to be included in the results.

Basic Syntax

SELECT column1, column2, ...
FROM table_name
WHERE condition;

How It Works

  1. The database reads rows from the table
  2. For each row, it evaluates the WHERE condition
  3. If the condition is TRUE → the row is included in results
  4. If the condition is FALSE or NULL → the row is excluded

Simple Example

SELECT customer_id, customer_city, customer_state
FROM olist_customers_dataset
WHERE customer_state = 'SP';

This query:

  • Looks at each customer
  • Checks if customer_state equals 'SP'
  • Returns only customers from São Paulo state

Types of Filtering Operators

SQL provides various operators for different filtering needs. Here's an overview:

Comparison Operators

Compare values for equality or relative size:

  • = Equal to
  • != or <> Not equal to
  • >, <, >=, <= Greater/less than

Logical Operators

Combine multiple conditions:

  • AND - All conditions must be true
  • OR - At least one condition must be true
  • NOT - Reverses a condition

Range Operators

Filter within a range of values:

  • BETWEEN - Values within a range (inclusive)

List Operators

Check if a value matches any item in a list:

  • IN - Matches any value in a set
  • NOT IN - Doesn't match any value in a set

Pattern Matching

Search for text patterns:

  • LIKE - Matches patterns using wildcards
  • % - Matches any sequence of characters
  • _ - Matches any single character

NULL Handling

Deal with missing or unknown values:

  • IS NULL - Value is missing
  • IS NOT NULL - Value exists

In the following lessons, we'll explore each of these operators in detail.

Real-World Filtering Scenarios

Let's look at practical examples where filtering is essential:

Scenario 1: Monthly Sales Report

Question: "What were our sales in January 2018?"

SELECT *
FROM olist_orders_dataset
WHERE order_purchase_timestamp >= '2018-01-01'
  AND order_purchase_timestamp < '2018-02-01';

Scenario 2: High-Value Transactions

Question: "Which orders have payment value above $500?"

SELECT *
FROM olist_order_payments_dataset
WHERE payment_value > 500;

Scenario 3: Customer Segmentation

Question: "Find customers in major cities (São Paulo, Rio, Belo Horizonte)."

SELECT *
FROM olist_customers_dataset
WHERE customer_city IN ('sao paulo', 'rio de janeiro', 'belo horizonte');

Scenario 4: Product Search

Question: "Find products in categories that contain 'electronics'."

SELECT *
FROM olist_products_dataset
WHERE product_category_name LIKE '%eletronico%';

Scenario 5: Data Quality Check

Question: "Find orders without delivery date (not yet delivered)."

SELECT *
FROM olist_orders_dataset
WHERE order_delivered_customer_date IS NULL;

Practice

Let's practice basic filtering to get familiar with the WHERE clause.

Exercise 1: Filter by State

Find all customers from Rio de Janeiro state (RJ):

-- Find customers from RJ state

SELECT customer_id, customer_city, customer_state
FROM olist_customers_dataset
WHERE customer_state = 'RJ'
LIMIT 10;

Exercise 2: Filter by Status

Find all orders that have been delivered:

-- Find delivered orders

SELECT order_id, customer_id, order_status
FROM olist_orders_dataset
WHERE order_status = 'delivered'
LIMIT 10;

Exercise 3: Filter by Numeric Value

Find order items with a price greater than $200:

-- Find expensive items (price > 200)

SELECT order_id, product_id, price
FROM olist_order_items_dataset
WHERE price > 200
LIMIT 10;

Summary

You now understand what data filtering is and why it's essential in SQL!

Key Takeaways

Data filtering narrows results to show only relevant rows

✅ The WHERE clause is SQL's primary filtering mechanism

✅ Filtering improves performance, relevance, and security

✅ SQL provides many filtering operators:

  • Comparison: =, !=, >, <, >=, <=
  • Logical: AND, OR, NOT
  • Range: BETWEEN
  • List: IN, NOT IN
  • Pattern: LIKE
  • NULL: IS NULL, IS NOT NULL

What's Next?

In the upcoming lessons, we'll dive deep into each filtering operator:

  1. Comparison Operators - Equal, not equal, greater/less than
  2. AND Operator - Combining conditions that must all be true
  3. OR Operator - Combining conditions where any can be true
  4. NOT Operator - Reversing conditions
  5. BETWEEN Operator - Filtering value ranges
  6. IN Operator - Matching values from a list
  7. LIKE Operator - Pattern matching in text