Project Resources
Overview
Before diving into the Exploratory Data Analysis (EDA) project, you need to make sure all the tools and resources are ready. Everything required for this project is free, so there are no cost barriers.
This lesson walks through each resource, explains what it does, and why it matters for the project.
SQL Server Express
SQL Server Express is a free edition of Microsoft SQL Server. It is the database engine that stores and manages all of our data. Every table, every query, and every analytical result lives inside SQL Server.
SQL Server Express supports all the core features we need for EDA:
- Creating databases, schemas, and tables
- Running complex SELECT queries with JOINs, aggregations, and window functions
- Views and stored procedures
- Importing data from CSV files
The Express edition is limited to 10 GB per database, but that is more than sufficient for our dataset and analysis.
Installation Steps
- Go to the official Microsoft SQL Server downloads page.
- Download SQL Server 2022 Express (or the latest version available).
- Run the installer and choose the Basic installation type.
- Follow the on-screen prompts — the default settings work well.
- Note the server name after installation (typically
localhost\SQLEXPRESS).
Once installed, SQL Server runs as a background service. You interact with it through SQL Server Management Studio (SSMS).
SQL Server Management Studio (SSMS)
SQL Server Management Studio (SSMS) is a free graphical tool from Microsoft that connects to SQL Server and lets you interact with databases visually. This is where you will write and execute all your EDA queries.
Think of it this way: SQL Server is the engine, and SSMS is the control panel.
Key Features
- Query Editor: Write and run SQL statements with syntax highlighting and auto-complete.
- Object Explorer: Browse databases, tables, views, and schemas in a tree structure.
- Results Grid: View query output in a clean table format.
- Execution Plans: Visualize how SQL Server processes your queries — useful for optimization.
- Import/Export Wizard: Load data from CSV or other external sources.
Connecting to SQL Server
When you open SSMS, it asks you to connect to a server:
Server type: Database Engine
Server name: localhost\SQLEXPRESS
Authentication: Windows AuthenticationAfter connecting, expand Databases in Object Explorer to see all databases on your server. Our DataWarehouseAnalytics database will appear here once we set it up.
Project Repository on GitHub
The project files — including SQL scripts, CSV datasets, and documentation — are hosted on GitHub. You will download these to set up the database and follow along with each lesson.
The repository contains:
- SQL Scripts: All exploration and analysis scripts organized by topic (database exploration, dimensions, dates, measures, magnitude, ranking).
- Dataset Files (CSV): Pre-built Gold layer tables (
dim_customers,dim_products,fact_sales) ready for import. - Initialization Script: A setup script (
00_init_database.sql) that creates the database, schema, tables, and loads the data.
Downloading the Files
- Navigate to the project repository on GitHub.
- Click the green Code button.
- Select Download ZIP.
- Extract the ZIP to a folder on your computer.
Alternatively, if you are familiar with Git, you can clone the repository using:
git clone <repository-url>
Inside the extracted folder, you will find a scripts/ directory with all the SQL files and a datasets/ directory with the CSV files.
Setting Up the Database
The initialization script (00_init_database.sql) does everything you need to get started:
- Creates the database called
DataWarehouseAnalytics - Creates the
goldschema inside the database - Creates three tables:
gold.dim_customers,gold.dim_products,gold.fact_sales - Loads CSV data into each table using
BULK INSERT
Before running the script, update the file paths in the BULK INSERT statements to point to where you extracted the CSV files on your machine.
-- Example: Update the file path to match your local folder
BULK INSERT gold.dim_customers
FROM 'C:\your-path\datasets\csv-files\gold.dim_customers.csv'
WITH (
FIRSTROW = 2,
FIELDTERMINATOR = ',',
TABLOCK
);After running the initialization script, verify the setup by running a quick count on each table:
SELECT 'dim_customers' AS table_name, COUNT(*) AS row_count FROM gold.dim_customers
UNION ALL
SELECT 'dim_products', COUNT(*) FROM gold.dim_products
UNION ALL
SELECT 'fact_sales', COUNT(*) FROM gold.fact_sales;If all three tables return row counts greater than zero, your database is ready for exploration.
Summary
Here is a quick checklist of everything you need before starting the EDA project:
| Resource | Purpose |
|---|---|
| SQL Server Express | Database engine to store and query data |
| SSMS | Graphical interface for writing and running SQL |
| Project Repository | SQL scripts and CSV dataset files |
| Initialization Script | Creates the database, tables, and loads data |
Once all resources are in place, you are ready to start exploring the data.