Skip to main content

Intro - What is Data Transformation

Introduction

Data rarely comes in the exact format you need. Names might be in lowercase when you need uppercase. Dates might be stored as text. Numbers might need rounding. This is where data transformation comes in.

In this tutorial, you will learn:

  • What data transformation means in SQL
  • Why we need to transform data
  • Categories of SQL transformation functions
  • Real-world examples of data transformation

What is Data Transformation?

Data transformation is the process of converting data from one format or structure to another. In SQL, this means using built-in functions to modify, clean, or reformat data as part of your queries.

Common Transformation Goals

GoalExample
Standardize formatConvert 'john' to 'JOHN'
Clean dataRemove extra spaces from ' hello '
Combine dataJoin 'John' + 'Doe' into 'John Doe'
Extract partsGet 'John' from 'John Doe'
Convert typesChange '2024-01-15' text to a date
Calculate valuesRound 3.14159 to 3.14
Overview diagram showing SQL data transformation categories: string functions, date functions, numeric functions, and type conversion
Overview diagram showing SQL data transformation categories: string functions, date functions, numeric functions, and type conversion

Why Transform Data?

Real-world data is messy. Here's why transformation is essential:

1. Data Consistency

User input varies wildly. One customer enters 'São Paulo', another enters 'SAO PAULO', and a third types 'sao paulo'. Transformation helps standardize these variations.

2. Reporting Requirements

Reports often need specific formats:

  • Dates as 'January 15, 2024' instead of '2024-01-15'
  • Names in 'LASTNAME, Firstname' format
  • Prices with exactly 2 decimal places

3. Data Integration

When combining data from different sources, you often need to transform values to match:

  • Different date formats
  • Different naming conventions
  • Different units of measurement

4. Application Logic

Sometimes business rules require transformed data:

  • Email addresses in lowercase for matching
  • Phone numbers without special characters
  • Product codes in specific formats

Categories of SQL Functions

SQL provides several categories of built-in functions for data transformation:

1. String Functions

Manipulate text data:

FunctionPurpose
UPPER(), LOWER()Change case
TRIM()Remove whitespace
CONCAT() or ||Combine strings
SUBSTRING()Extract part of a string
REPLACE()Replace text
LENGTH()Count characters
LEFT(), RIGHT()Extract from start/end

2. Number Functions

Work with numeric data:

FunctionPurpose
ROUND()Round to decimal places
ABS()Get absolute value
CEILING()Round up
FLOOR()Round down
MOD()Get remainder

3. Date & Time Functions

Handle temporal data:

FunctionPurpose
DATE()Extract date
strftime()Format dates
julianday()Calculate date differences

4. NULL Handling Functions

Deal with missing values:

FunctionPurpose
COALESCE()Return first non-NULL value
IFNULL()Replace NULL with default
NULLIF()Return NULL if values match

Transformation in Action

Let's see a real example of data transformation using our e-commerce database.

The Problem

Customer cities are stored in lowercase, but for a report we need:

  • City names in uppercase
  • Combined with state code
  • Clean format: 'SAO PAULO (SP)'
-- Transform customer location data for reporting
SELECT 
    customer_id,
    customer_city,
    customer_state,
    UPPER(customer_city) || ' (' || customer_state || ')' AS formatted_location
FROM olist_customers_dataset
LIMIT 10;

This query demonstrates:

  1. UPPER() - Converts city name to uppercase
  2. String concatenation (||) - Combines multiple values
  3. Literal strings - Adds parentheses around state code

The result transforms 'sao paulo' + 'SP' into 'SAO PAULO (SP)'.

Where Transformation Happens

You can apply transformation functions in different parts of a SQL query:

In SELECT (Output Formatting)

SELECT UPPER(customer_city) AS city
FROM customers;

Transforms the displayed output.

In WHERE (Filtering)

SELECT *
FROM customers
WHERE UPPER(customer_city) = 'SAO PAULO';

Standardizes values for comparison.

In ORDER BY (Sorting)

SELECT *
FROM customers
ORDER BY LENGTH(customer_city);

Sorts by transformed values.

In GROUP BY (Aggregation)

SELECT UPPER(customer_city), COUNT(*)
FROM customers
GROUP BY UPPER(customer_city);

Groups by transformed values to consolidate variations.

Practice: Basic Transformation

Let's practice with a simple transformation. We'll create a formatted customer location string.

-- Create a formatted location for each customer
-- Format: "City, STATE" (e.g., "Sao Paulo, SP")

SELECT 
    customer_id,
    customer_city || ', ' || customer_state AS location
FROM olist_customers_dataset
LIMIT 10;

What's Next

Now that you understand what data transformation is and why it matters, we'll dive deep into each category of functions:

  1. SQL Functions Overview - Understanding how functions work
  2. String Functions - CONCAT, UPPER, LOWER, TRIM, REPLACE, and more
  3. Number Functions - ROUND, ABS, and mathematical operations
  4. Date Functions - Working with dates and times

Each function will be covered with practical examples using our e-commerce database.

Key Takeaways

  • Data transformation converts data from one format to another
  • SQL provides built-in functions for transformation
  • Functions can be used in SELECT, WHERE, ORDER BY, and GROUP BY
  • Main categories: String, Number, Date/Time, NULL handling
  • Transformation is essential for data consistency and reporting
  • In SQLite, use || for string concatenation instead of CONCAT()