Skip to main content

Introduction to NULLs

Introduction

NULL is a special marker in SQL that represents a missing, unknown, or inapplicable value. It is not the same as zero, an empty string, or any other default value. Understanding NULL is fundamental to writing correct SQL queries.

In this tutorial, you will learn:

  • What NULL means in SQL
  • How NULL differs from other values
  • Why NULL is important in databases
  • How SQL handles NULL in operations

What is NULL?

NULL represents the absence of a value. When a field in a database has no value, SQL stores NULL in that field.

Key Characteristics of NULL

CharacteristicDescription
Not a valueNULL is not equal to 0, empty string, or false
UnknownNULL means "we don't know"
PropagatesOperations with NULL usually return NULL
Special comparisonCannot use = to compare with NULL

Common Scenarios Where NULL Appears

  1. Optional data - Customer middle name that wasn't provided
  2. Unknown information - A delivery date for an order not yet shipped
  3. Not applicable - Commission rate for non-sales employees
  4. Data import issues - Missing values from source systems
Diagram showing NULL as the absence of a value, distinct from zero, empty string, and false, with database cell examples
Diagram showing NULL as the absence of a value, distinct from zero, empty string, and false, with database cell examples

Seeing NULL in Real Data

Let's look at real data that contains NULL values. Orders that haven't been delivered will have NULL in their delivery date columns:

-- View orders with potential NULL values
SELECT 
    order_id,
    order_status,
    order_purchase_timestamp,
    order_approved_at,
    order_delivered_carrier_date,
    order_delivered_customer_date
FROM olist_orders_dataset
WHERE order_status != 'delivered'
LIMIT 15;

Notice how some columns display NULL or nothing for undelivered orders. These are true NULL values, not empty strings.

NULL is Not Zero or Empty String

A common misconception is that NULL equals 0 or an empty string. Let's demonstrate the difference:

-- Demonstrating NULL vs Zero vs Empty String
SELECT 
    NULL AS null_value,
    0 AS zero_value,
    '' AS empty_string,
    -- Comparison results
    CASE WHEN NULL = 0 THEN 'Equal' ELSE 'Not Equal' END AS null_vs_zero,
    CASE WHEN NULL = '' THEN 'Equal' ELSE 'Not Equal' END AS null_vs_empty,
    CASE WHEN 0 = '' THEN 'Equal' ELSE 'Not Equal' END AS zero_vs_empty;

Important: Comparisons with NULL always result in UNKNOWN (which is treated as false in WHERE clauses). This is why NULL = 0 and NULL = '' both return 'Not Equal'.

Three-Valued Logic

SQL uses three-valued logic: TRUE, FALSE, and UNKNOWN. When NULL is involved in a comparison, the result is UNKNOWN.

Truth Table with NULL

ExpressionResult
TRUE AND NULLUNKNOWN
FALSE AND NULLFALSE
TRUE OR NULLTRUE
FALSE OR NULLUNKNOWN
NOT NULLUNKNOWN
NULL = NULLUNKNOWN
NULL <> NULLUNKNOWN
-- Three-valued logic demonstration
SELECT 
    -- NULL comparisons
    CASE WHEN NULL = NULL THEN 'TRUE' WHEN NOT (NULL = NULL) THEN 'FALSE' ELSE 'UNKNOWN' END AS null_equals_null,
    CASE WHEN NULL <> NULL THEN 'TRUE' WHEN NOT (NULL <> NULL) THEN 'FALSE' ELSE 'UNKNOWN' END AS null_not_equals_null,
    -- Boolean logic with NULL
    CASE WHEN (1=1) AND NULL THEN 'TRUE' WHEN NOT ((1=1) AND NULL) THEN 'FALSE' ELSE 'UNKNOWN' END AS true_and_null,
    CASE WHEN (1=0) AND NULL THEN 'TRUE' WHEN NOT ((1=0) AND NULL) THEN 'FALSE' ELSE 'UNKNOWN' END AS false_and_null,
    CASE WHEN (1=1) OR NULL THEN 'TRUE' WHEN NOT ((1=1) OR NULL) THEN 'FALSE' ELSE 'UNKNOWN' END AS true_or_null;

NULL in WHERE Clause

Because comparisons with NULL return UNKNOWN, regular comparison operators don't work as expected. The WHERE clause only returns rows where the condition is TRUE, not UNKNOWN.

-- This will NOT find NULL values!
SELECT 
    order_id,
    order_status,
    order_delivered_customer_date
FROM olist_orders_dataset
WHERE order_delivered_customer_date = NULL  -- Wrong way!
LIMIT 10;

The query above returns no rows! This is because column = NULL evaluates to UNKNOWN, not TRUE.

-- Correct way to find NULL values
SELECT 
    order_id,
    order_status,
    order_delivered_customer_date
FROM olist_orders_dataset
WHERE order_delivered_customer_date IS NULL  -- Correct way!
LIMIT 10;

NULL Propagation in Expressions

When NULL is used in expressions, it typically propagates (the result becomes NULL). This behavior is sometimes called "NULL contagion."

-- NULL propagation in calculations
SELECT 
    -- Arithmetic
    5 + NULL AS addition,
    10 * NULL AS multiplication,
    100 / NULL AS division,
    -- String operations
    'Hello' || NULL AS concatenation,
    -- Comparisons
    CASE WHEN 5 > NULL THEN 'Yes' ELSE 'No or Unknown' END AS greater_than_null;

Exception: Some aggregate functions like COUNT, SUM, and AVG handle NULL specially (they typically ignore NULL values).

Counting NULLs in Data

To understand your data quality, it's useful to count how many NULL values exist in a column:

-- Count NULL values in order date columns
SELECT 
    COUNT(*) AS total_orders,
    COUNT(order_purchase_timestamp) AS has_purchase_date,
    COUNT(*) - COUNT(order_purchase_timestamp) AS missing_purchase_date,
    COUNT(order_approved_at) AS has_approval_date,
    COUNT(*) - COUNT(order_approved_at) AS missing_approval_date,
    COUNT(order_delivered_customer_date) AS has_delivery_date,
    COUNT(*) - COUNT(order_delivered_customer_date) AS missing_delivery_date
FROM olist_orders_dataset;

Note: COUNT(*) counts all rows, while COUNT(column) counts only non-NULL values. The difference gives you the NULL count.

Why NULL Matters

Understanding NULL is crucial because:

  1. Query correctness - Using = NULL instead of IS NULL returns wrong results
  2. Calculation accuracy - NULL in math operations propagates and can cause unexpected results
  3. Join behavior - NULL values don't match in JOINs
  4. Aggregation - Some aggregates ignore NULLs, which may or may not be desired
  5. Data quality - NULL counts help identify data issues
-- See NULL distribution by order status
SELECT 
    order_status,
    COUNT(*) AS total_orders,
    SUM(CASE WHEN order_delivered_customer_date IS NULL THEN 1 ELSE 0 END) AS null_delivery_dates,
    ROUND(
        SUM(CASE WHEN order_delivered_customer_date IS NULL THEN 1.0 ELSE 0 END) / COUNT(*) * 100,
        1
    ) AS null_percentage
FROM olist_orders_dataset
GROUP BY order_status
ORDER BY total_orders DESC;

Summary

Key Takeaways

  1. NULL represents missing or unknown values

    • Not the same as zero or empty string
    • Represents "we don't know"
  2. Three-valued logic

    • SQL uses TRUE, FALSE, and UNKNOWN
    • Comparisons with NULL return UNKNOWN
  3. Checking for NULL

    WHERE column IS NULL     -- Find NULLs
    WHERE column IS NOT NULL -- Exclude NULLs
    -- NOT: WHERE column = NULL (won't work!)
    
  4. NULL propagation

    • Most operations with NULL return NULL
    • 5 + NULL = NULL
    • 'text' || NULL = NULL
  5. Counting NULLs

    COUNT(*) - COUNT(column) AS null_count
    

What's Next?

In the next lesson, we'll learn about COALESCE vs ISNULL - functions that help us handle NULL values by providing default values.