Intro - What is Data Filtering
Introduction
Imagine you have a massive spreadsheet with thousands of rows of customer orders. You don't want to see every single row - you want to find specific information, like orders from a particular city or purchases above a certain amount.
Data filtering is the process of narrowing down your results to show only the rows that meet specific criteria. It's one of the most fundamental and frequently used operations in SQL.
In this lesson, you will learn:
- What data filtering means and why it's essential
- How the WHERE clause enables filtering
- Real-world scenarios where filtering is critical
- Overview of filtering operators available in SQL
What is Data Filtering?
Data filtering is the technique of selecting specific rows from a database table based on conditions you define. Instead of retrieving all data, you retrieve only the data that matters for your analysis or application.
Analogy: The Coffee Filter
Think of data filtering like a coffee filter:
- All the data = Coffee grounds + water mixture
- The filter condition = The filter paper
- Filtered results = Clean coffee in your cup
Just as a coffee filter separates what you want (coffee) from what you don't (grounds), SQL filtering separates relevant rows from irrelevant ones.
Without Filtering vs With Filtering
Without filtering:
SELECT * FROM olist_orders_dataset;
→ Returns ALL orders (potentially hundreds of thousands of rows)
With filtering:
SELECT * FROM olist_orders_dataset
WHERE order_status = 'delivered';
→ Returns only delivered orders

Why is Data Filtering Important?
Filtering data is crucial for several reasons:
1. Performance
Databases can contain millions or billions of rows. Retrieving all of them is:
- Slow - Takes a long time to execute
- Resource-intensive - Uses more memory and bandwidth
- Impractical - No one can analyze millions of rows manually
2. Relevance
Business questions are usually specific:
- "How many orders were placed last month?"
- "Which customers are from São Paulo?"
- "What products cost more than $100?"
Filtering gives you precisely the data needed to answer these questions.
3. Security and Privacy
Sometimes you should only show certain data:
- A manager sees only their department's data
- Users see only their own orders
- Sensitive information is excluded from reports
4. Data Quality
Filtering helps exclude invalid or unwanted data:
- Exclude canceled orders from sales reports
- Remove test data from production analysis
- Focus on specific date ranges for trends
The WHERE Clause - Your Filtering Tool
In SQL, the WHERE clause is the primary mechanism for filtering data. It specifies conditions that rows must meet to be included in the results.
Basic Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition;
How It Works
- The database reads rows from the table
- For each row, it evaluates the WHERE condition
- If the condition is TRUE → the row is included in results
- If the condition is FALSE or NULL → the row is excluded
Simple Example
SELECT customer_id, customer_city, customer_state
FROM olist_customers_dataset
WHERE customer_state = 'SP';
This query:
- Looks at each customer
- Checks if
customer_stateequals 'SP' - Returns only customers from São Paulo state
Types of Filtering Operators
SQL provides various operators for different filtering needs. Here's an overview:
Comparison Operators
Compare values for equality or relative size:
=Equal to!=or<>Not equal to>,<,>=,<=Greater/less than
Logical Operators
Combine multiple conditions:
AND- All conditions must be trueOR- At least one condition must be trueNOT- Reverses a condition
Range Operators
Filter within a range of values:
BETWEEN- Values within a range (inclusive)
List Operators
Check if a value matches any item in a list:
IN- Matches any value in a setNOT IN- Doesn't match any value in a set
Pattern Matching
Search for text patterns:
LIKE- Matches patterns using wildcards%- Matches any sequence of characters_- Matches any single character
NULL Handling
Deal with missing or unknown values:
IS NULL- Value is missingIS NOT NULL- Value exists
In the following lessons, we'll explore each of these operators in detail.
Real-World Filtering Scenarios
Let's look at practical examples where filtering is essential:
Scenario 1: Monthly Sales Report
Question: "What were our sales in January 2018?"
SELECT *
FROM olist_orders_dataset
WHERE order_purchase_timestamp >= '2018-01-01'
AND order_purchase_timestamp < '2018-02-01';
Scenario 2: High-Value Transactions
Question: "Which orders have payment value above $500?"
SELECT *
FROM olist_order_payments_dataset
WHERE payment_value > 500;
Scenario 3: Customer Segmentation
Question: "Find customers in major cities (São Paulo, Rio, Belo Horizonte)."
SELECT *
FROM olist_customers_dataset
WHERE customer_city IN ('sao paulo', 'rio de janeiro', 'belo horizonte');
Scenario 4: Product Search
Question: "Find products in categories that contain 'electronics'."
SELECT *
FROM olist_products_dataset
WHERE product_category_name LIKE '%eletronico%';
Scenario 5: Data Quality Check
Question: "Find orders without delivery date (not yet delivered)."
SELECT *
FROM olist_orders_dataset
WHERE order_delivered_customer_date IS NULL;
Practice
Let's practice basic filtering to get familiar with the WHERE clause.
Exercise 1: Filter by State
Find all customers from Rio de Janeiro state (RJ):
-- Find customers from RJ state
SELECT customer_id, customer_city, customer_state
FROM olist_customers_dataset
WHERE customer_state = 'RJ'
LIMIT 10;Exercise 2: Filter by Status
Find all orders that have been delivered:
-- Find delivered orders
SELECT order_id, customer_id, order_status
FROM olist_orders_dataset
WHERE order_status = 'delivered'
LIMIT 10;Exercise 3: Filter by Numeric Value
Find order items with a price greater than $200:
-- Find expensive items (price > 200)
SELECT order_id, product_id, price
FROM olist_order_items_dataset
WHERE price > 200
LIMIT 10;Summary
You now understand what data filtering is and why it's essential in SQL!
Key Takeaways
✅ Data filtering narrows results to show only relevant rows
✅ The WHERE clause is SQL's primary filtering mechanism
✅ Filtering improves performance, relevance, and security
✅ SQL provides many filtering operators:
- Comparison:
=,!=,>,<,>=,<= - Logical:
AND,OR,NOT - Range:
BETWEEN - List:
IN,NOT IN - Pattern:
LIKE - NULL:
IS NULL,IS NOT NULL
What's Next?
In the upcoming lessons, we'll dive deep into each filtering operator:
- Comparison Operators - Equal, not equal, greater/less than
- AND Operator - Combining conditions that must all be true
- OR Operator - Combining conditions where any can be true
- NOT Operator - Reversing conditions
- BETWEEN Operator - Filtering value ranges
- IN Operator - Matching values from a list
- LIKE Operator - Pattern matching in text