Intro - What is Data Transformation
Introduction
Data rarely comes in the exact format you need. Names might be in lowercase when you need uppercase. Dates might be stored as text. Numbers might need rounding. This is where data transformation comes in.
In this tutorial, you will learn:
- What data transformation means in SQL
- Why we need to transform data
- Categories of SQL transformation functions
- Real-world examples of data transformation
What is Data Transformation?
Data transformation is the process of converting data from one format or structure to another. In SQL, this means using built-in functions to modify, clean, or reformat data as part of your queries.
Common Transformation Goals
| Goal | Example |
|---|---|
| Standardize format | Convert 'john' to 'JOHN' |
| Clean data | Remove extra spaces from ' hello ' |
| Combine data | Join 'John' + 'Doe' into 'John Doe' |
| Extract parts | Get 'John' from 'John Doe' |
| Convert types | Change '2024-01-15' text to a date |
| Calculate values | Round 3.14159 to 3.14 |

Why Transform Data?
Real-world data is messy. Here's why transformation is essential:
1. Data Consistency
User input varies wildly. One customer enters 'São Paulo', another enters 'SAO PAULO', and a third types 'sao paulo'. Transformation helps standardize these variations.
2. Reporting Requirements
Reports often need specific formats:
- Dates as 'January 15, 2024' instead of '2024-01-15'
- Names in 'LASTNAME, Firstname' format
- Prices with exactly 2 decimal places
3. Data Integration
When combining data from different sources, you often need to transform values to match:
- Different date formats
- Different naming conventions
- Different units of measurement
4. Application Logic
Sometimes business rules require transformed data:
- Email addresses in lowercase for matching
- Phone numbers without special characters
- Product codes in specific formats
Categories of SQL Functions
SQL provides several categories of built-in functions for data transformation:
1. String Functions
Manipulate text data:
| Function | Purpose |
|---|---|
UPPER(), LOWER() | Change case |
TRIM() | Remove whitespace |
CONCAT() or || | Combine strings |
SUBSTRING() | Extract part of a string |
REPLACE() | Replace text |
LENGTH() | Count characters |
LEFT(), RIGHT() | Extract from start/end |
2. Number Functions
Work with numeric data:
| Function | Purpose |
|---|---|
ROUND() | Round to decimal places |
ABS() | Get absolute value |
CEILING() | Round up |
FLOOR() | Round down |
MOD() | Get remainder |
3. Date & Time Functions
Handle temporal data:
| Function | Purpose |
|---|---|
DATE() | Extract date |
strftime() | Format dates |
julianday() | Calculate date differences |
4. NULL Handling Functions
Deal with missing values:
| Function | Purpose |
|---|---|
COALESCE() | Return first non-NULL value |
IFNULL() | Replace NULL with default |
NULLIF() | Return NULL if values match |
Transformation in Action
Let's see a real example of data transformation using our e-commerce database.
The Problem
Customer cities are stored in lowercase, but for a report we need:
- City names in uppercase
- Combined with state code
- Clean format: 'SAO PAULO (SP)'
-- Transform customer location data for reporting
SELECT
customer_id,
customer_city,
customer_state,
UPPER(customer_city) || ' (' || customer_state || ')' AS formatted_location
FROM olist_customers_dataset
LIMIT 10;This query demonstrates:
- UPPER() - Converts city name to uppercase
- String concatenation (||) - Combines multiple values
- Literal strings - Adds parentheses around state code
The result transforms 'sao paulo' + 'SP' into 'SAO PAULO (SP)'.
Where Transformation Happens
You can apply transformation functions in different parts of a SQL query:
In SELECT (Output Formatting)
SELECT UPPER(customer_city) AS city
FROM customers;
Transforms the displayed output.
In WHERE (Filtering)
SELECT *
FROM customers
WHERE UPPER(customer_city) = 'SAO PAULO';
Standardizes values for comparison.
In ORDER BY (Sorting)
SELECT *
FROM customers
ORDER BY LENGTH(customer_city);
Sorts by transformed values.
In GROUP BY (Aggregation)
SELECT UPPER(customer_city), COUNT(*)
FROM customers
GROUP BY UPPER(customer_city);
Groups by transformed values to consolidate variations.
Practice: Basic Transformation
Let's practice with a simple transformation. We'll create a formatted customer location string.
-- Create a formatted location for each customer
-- Format: "City, STATE" (e.g., "Sao Paulo, SP")
SELECT
customer_id,
customer_city || ', ' || customer_state AS location
FROM olist_customers_dataset
LIMIT 10;What's Next
Now that you understand what data transformation is and why it matters, we'll dive deep into each category of functions:
- SQL Functions Overview - Understanding how functions work
- String Functions - CONCAT, UPPER, LOWER, TRIM, REPLACE, and more
- Number Functions - ROUND, ABS, and mathematical operations
- Date Functions - Working with dates and times
Each function will be covered with practical examples using our e-commerce database.
Key Takeaways
- Data transformation converts data from one format to another
- SQL provides built-in functions for transformation
- Functions can be used in SELECT, WHERE, ORDER BY, and GROUP BY
- Main categories: String, Number, Date/Time, NULL handling
- Transformation is essential for data consistency and reporting
- In SQLite, use
||for string concatenation instead of CONCAT()