Introduction to Hashing

Description

Hashing is a technique that maps data of arbitrary size to fixed-size values using a hash function. The resulting value (called a hash code or hash value) is used as an index into a hash table — a data structure that enables near-instant lookups, insertions, and deletions.

The three core components of hashing are:

Key — The input data you want to store or look up (e.g., a string, number, or object)
Hash Function — A function that converts the key into an index within a fixed range
Hash Table — An array-like structure where data is stored at positions determined by the hash function

Hashing solves a fundamental problem: how can we check if an element exists in a collection in O(1) time, rather than O(n) for linear search or O(log n) for binary search?

In this lesson, we will understand how hashing works, explore what makes a good hash function, learn how to handle collisions (when two different keys produce the same hash), and see the distinction between a Hash Set (stores only keys) and a Hash Map (stores key-value pairs).

Given an array of integers and a series of queries, implement a hash-based structure to answer "does element X exist in the array?" in O(1) average time.

Examples

Example 1

Input: keys = ["apple", "banana", "cherry"], table_size = 7

Hash Computation:

hash("apple") = (97+112+112+108+101) % 7 = 530 % 7 = 5
hash("banana") = (98+97+110+97+110+97) % 7 = 609 % 7 = lemen
hash("cherry") = (99+104+101+114+114+121) % 7 = 653 % 7 = 2

Hash Table:

Index	0	1	2	3	4	5	6
Value	—	—	cherry	banana	—	apple	—

Explanation: Each key is transformed by the hash function into an index. The key is stored at that index in the table. To look up "apple", we compute hash("apple")=5 and directly access index 5 — no searching needed.

Example 2

Input: arr = [4, 7, 1, 9, 3], query = 7

Without Hashing (Linear Search):
Compare 7 with arr[0]=4 (no), arr[1]=7 (YES!) → Found at index 1. Took 2 comparisons.
Worst case: O(n) comparisons.

With Hashing (Hash Set):

Build hash set by inserting all elements: {4, 7, 1, 9, 3}
Query: Is 7 in the set? → YES (single O(1) lookup)

Explanation: With a hash set, every lookup is O(1) regardless of the collection size. For repeated queries, this is dramatically faster than scanning the array each time.

Example 3 — Collision

Input: keys = [15, 25, 35], table_size = 10

Hash Function: hash(key) = key % 10

hash(15) = 15 % 10 = 5
hash(25) = 25 % 10 = 5 ← Same index as 15! COLLISION
hash(35) = 35 % 10 = 5 ← Another collision!

Chaining Resolution: At index 5, store a linked list: 15 → 25 → 35

Explanation: When multiple keys hash to the same index, this is called a collision. One common resolution is chaining — each slot holds a list of all elements that map to it. Lookups check the list at the computed index.

Constraints

Hash table size is typically a prime number for better distribution
A good hash function should distribute keys uniformly across the table
Average time complexity: O(1) for insert, delete, and search
Worst case (all keys collide): O(n) — but extremely rare with a good hash function
Space complexity: O(n) for storing n elements

Editorial

Brute Force

Intuition

Before understanding why hashing is powerful, let's first see the problem it solves. Suppose you have a collection of n elements and you repeatedly need to check whether a given element exists in it.

Without hashing, the simplest approach is linear search: scan through every element until you find a match or reach the end.

Imagine you have an unsorted bookshelf with 1000 books. Someone asks, "Do you have War and Peace?" You'd have to check each book one by one from left to right. If the book is the very last one — or doesn't exist at all — you'd check all 1000 books.

For a single query, linear search is acceptable. But what if you receive 1000 queries? That's 1000 × 1000 = 1,000,000 comparisons in the worst case.

Step-by-Step Explanation

Let's trace a simple existence check with arr = [4, 7, 1, 9, 3] and queries = [7, 5]:

Query 1: Does 7 exist?

Compare 7 with arr[0]=4 → not equal
Compare 7 with arr[1]=7 → MATCH! Return true.
Took 2 comparisons.

Query 2: Does 5 exist?

Compare 5 with arr[0]=4 → not equal
Compare 5 with arr[1]=7 → not equal
Compare 5 with arr[2]=1 → not equal
Compare 5 with arr[3]=9 → not equal
Compare 5 with arr[4]=3 → not equal
Reached end. Return false.
Took 5 comparisons (worst case).

Total for 2 queries: 7 comparisons. For q queries on n elements: O(q × n) total.

Linear Search — Scanning for Element Existence — Watch the pointer scan element by element to find whether the query exists. Every element must potentially be checked.

Algorithm

Given array arr of n elements and query element x
For each index i from 0 to n-1:
- If arr[i] == x, return true (element found)
If loop finishes without finding x, return false
For q queries, repeat steps 2-3 for each query

Code

#include <vector>
using namespace std;

// Linear search for each query — O(q * n)
bool linearSearch(vector<int>& arr, int x) {
    for (int i = 0; i < arr.size(); i++) {
        if (arr[i] == x) return true;
    }
    return false;
}

vector<bool> answerQueries(vector<int>& arr, vector<int>& queries) {
    vector<bool> results;
    for (int q : queries) {
        results.push_back(linearSearch(arr, q));
    }
    return results;
}

# Linear search for each query — O(q * n)
def linear_search(arr, x):
    for i in range(len(arr)):
        if arr[i] == x:
            return True
    return False

def answer_queries(arr, queries):
    results = []
    for q in queries:
        results.append(linear_search(arr, q))
    return results

import java.util.*;

public class Solution {
    // Linear search for each query — O(q * n)
    static boolean linearSearch(int[] arr, int x) {
        for (int i = 0; i < arr.length; i++) {
            if (arr[i] == x) return true;
        }
        return false;
    }
    
    static List<Boolean> answerQueries(int[] arr, int[] queries) {
        List<Boolean> results = new ArrayList<>();
        for (int q : queries) {
            results.add(linearSearch(arr, q));
        }
        return results;
    }
}

Complexity Analysis

Time Complexity: O(q × n)

For each of the q queries, we scan through the entire array of n elements in the worst case. This gives O(q × n) total time.

Space Complexity: O(1)

Linear search uses no extra data structures — just a loop variable. Space is O(1) excluding the output.

Why This Approach Is Not Efficient

Linear search takes O(n) per query. If we have q queries, the total time is O(q × n). With n = 10^5 and q = 10^5, that's 10^10 operations — impossibly slow.

The root problem: we're not remembering anything between queries. Each query starts from scratch, scanning the array from the beginning. There's no structure that lets us jump directly to where an element should be.

Hashing solves this by building a precomputed lookup structure in O(n) time, after which each query is answered in O(1) average time. Total: O(n + q) instead of O(n × q).

The key insight: a hash function converts the element directly into an address (index). Instead of searching for an element, we compute where it should be. This transforms search from a comparison-based operation into an arithmetic operation.

Optimal Approach - Hash Table

Intuition

A hash table is like a well-organized library with a special catalogue system. Instead of walking through every shelf to find a book, you consult the catalogue: it tells you exactly which shelf and position your book is on. The catalogue is the hash function, and the shelves form the hash table.

How it works:

Hash Function: Takes a key and produces an index. For integers, a simple hash function is h(key) = key % table_size. For strings, we might sum the ASCII values and take modulo.
Insertion: To insert element x, compute index = h(x), then store x at table[index].
Lookup: To check if x exists, compute index = h(x), then check if x is at table[index]. This is O(1)!
Collision Handling: Sometimes two different keys produce the same index. For example, with table_size=10, both 15 and 25 hash to index 5. This is a collision. Two common solutions:
- Chaining: Each slot holds a linked list of all elements that hash to that index
- Open Addressing: If a slot is occupied, probe the next slot (linear probing, quadratic probing, or double hashing)

Hash Set vs Hash Map:

Hash Set: Stores only keys. Used to check existence: "Is X in the set?" (like Python's set, C++ unordered_set, Java HashSet)
Hash Map: Stores key-value pairs. Used to associate values with keys: "What is the value for key X?" (like Python's dict, C++ unordered_map, Java HashMap)

Step-by-Step Explanation

Let's build a hash table of size 7 and insert elements [4, 7, 1, 9, 3], then query for 7 and 5.

Hash Function: h(key) = key % 7

Step 1 — Insert 4: h(4) = 4 % 7 = 4. Store 4 at index 4.
Table: [_, _, _, _, 4, _, _]

Step 2 — Insert 7: h(7) = 7 % 7 = 0. Store 7 at index 0.
Table: [7, _, _, _, 4, _, _]

Step 3 — Insert 1: h(1) = 1 % 7 = 1. Store 1 at index 1.
Table: [7, 1, _, _, 4, _, _]

Step 4 — Insert 9: h(9) = 9 % 7 = 2. Store 9 at index 2.
Table: [7, 1, 9, _, 4, _, _]

Step 5 — Insert 3: h(3) = 3 % 7 = 3. Store 3 at index 3.
Table: [7, 1, 9, 3, 4, _, _]

Step 6 — Query: Does 7 exist?
h(7) = 0. Check table[0] = 7. MATCH! Return true. ✓ (1 operation)

Step 7 — Query: Does 5 exist?
h(5) = 5. Check table[5] = empty. Return false. ✗ (1 operation)

Both queries took O(1) time — no scanning needed!

Hash Table Construction and Lookup — Watch how elements are hashed to compute their index and inserted into the table. Then see how queries are answered in O(1) by computing the hash and checking the slot directly.

Algorithm

Building the Hash Table:

Choose a table size m (preferably prime for better distribution)
Define hash function: h(key) = key % m
For each element in the input array:
- Compute index = h(element)
- Insert element at table[index]
- If a collision occurs, use chaining (linked list at each slot)

Answering Queries:

For query element x:
- Compute index = h(x)
- Check if x exists at table[index] (or in the chain at that index)
- Return true/false

Collision Handling — Chaining:

Each slot in the table holds a linked list (or dynamic array)
On collision, append the new element to the list at that slot
On lookup, traverse the list at the computed slot
Average chain length = n/m (called the load factor)

Collision Handling — Open Addressing (Linear Probing):

If slot h(key) is occupied, try slot h(key)+1, then h(key)+2, etc.
Wrap around to the beginning if needed
On lookup, follow the same probing sequence until finding the key or an empty slot

Code

#include <vector>
#include <unordered_set>
using namespace std;

// Using built-in hash set for O(1) lookups
vector<bool> answerQueries(vector<int>& arr, vector<int>& queries) {
    // Build hash set — O(n)
    unordered_set<int> hashSet(arr.begin(), arr.end());
    
    // Answer each query in O(1)
    vector<bool> results;
    for (int q : queries) {
        results.push_back(hashSet.count(q) > 0);
    }
    return results;
}

// --- Manual hash table with chaining ---
class HashTable {
    int size;
    vector<vector<int>> table; // Each slot is a chain (list)
    
    int hashFunction(int key) {
        return ((key % size) + size) % size; // Handle negatives
    }
    
public:
    HashTable(int sz) : size(sz), table(sz) {}
    
    void insert(int key) {
        int index = hashFunction(key);
        table[index].push_back(key);
    }
    
    bool search(int key) {
        int index = hashFunction(key);
        for (int val : table[index]) {
            if (val == key) return true;
        }
        return false;
    }
};

# Using built-in hash set for O(1) lookups
def answer_queries(arr, queries):
    # Build hash set — O(n)
    hash_set = set(arr)
    
    # Answer each query in O(1)
    return [q in hash_set for q in queries]


# --- Manual hash table with chaining ---
class HashTable:
    def __init__(self, size):
        self.size = size
        self.table = [[] for _ in range(size)]  # Each slot is a chain
    
    def _hash(self, key):
        return key % self.size
    
    def insert(self, key):
        index = self._hash(key)
        self.table[index].append(key)
    
    def search(self, key):
        index = self._hash(key)
        return key in self.table[index]


# Example usage:
ht = HashTable(7)
for num in [4, 7, 1, 9, 3]:
    ht.insert(num)

print(ht.search(7))   # True  — O(1)
print(ht.search(5))   # False — O(1)

import java.util.*;

public class Solution {
    // Using built-in HashSet for O(1) lookups
    static List<Boolean> answerQueries(int[] arr, int[] queries) {
        // Build hash set — O(n)
        Set<Integer> hashSet = new HashSet<>();
        for (int num : arr) hashSet.add(num);
        
        // Answer each query in O(1)
        List<Boolean> results = new ArrayList<>();
        for (int q : queries) {
            results.add(hashSet.contains(q));
        }
        return results;
    }
    
    // --- Manual hash table with chaining ---
    static class HashTable {
        int size;
        List<List<Integer>> table;
        
        HashTable(int sz) {
            size = sz;
            table = new ArrayList<>();
            for (int i = 0; i < sz; i++) {
                table.add(new ArrayList<>());
            }
        }
        
        int hashFunction(int key) {
            return ((key % size) + size) % size;
        }
        
        void insert(int key) {
            int index = hashFunction(key);
            table.get(index).add(key);
        }
        
        boolean search(int key) {
            int index = hashFunction(key);
            return table.get(index).contains(key);
        }
    }
}

Complexity Analysis

Time Complexity:

Build: O(n) — Insert each of n elements in O(1) amortized time
Query: O(1) average per query, O(q) for q queries
Total: O(n + q)

Worst case (all elements hash to the same slot): O(n) per query due to long chain. This is extremely rare with a good hash function and appropriately sized table.

Space Complexity: O(n)

The hash table stores all n elements. With chaining, each element occupies one node in a chain. The table itself has m slots (where m is the table size, usually proportional to n).

Load Factor: α = n/m. When α approaches 1, collisions increase. Most implementations automatically resize (rehash) when α exceeds a threshold (e.g., 0.75 in Java's HashMap).

Why this is optimal: We must read all n input elements (Ω(n) lower bound). We must answer each of q queries (Ω(q) lower bound). Our O(n + q) solution matches both bounds. No algorithm can do better for this problem.

Key Takeaways

When to Use Hashing

Fast lookups: Check if an element exists in O(1)
Counting frequencies: Count occurrences using a hash map
Finding pairs/groups: Two Sum, grouping anagrams, etc.
Deduplication: Remove duplicates using a hash set
Caching: Memoization with hash maps

When NOT to Use Hashing

Ordered data needed: Hash tables don't maintain insertion or sorted order (use balanced BST or sorted array instead)
Range queries: "Find all elements between 10 and 20" — hash tables can't do this efficiently (use segment trees or sorted structures)
Worst-case guarantees: If O(1) worst-case is required, hashing (which has O(n) worst case) may not be suitable

Hash Set vs Hash Map — Quick Reference

Feature	Hash Set	Hash Map
Stores	Keys only	Key-value pairs
Use case	Existence checking	Associating values
C++	`unordered_set`	`unordered_map`
Python	`set()`	`dict()` / `{}`
Java	`HashSet`	`HashMap`

Properties of a Good Hash Function

Deterministic: Same input always produces the same output
Uniform distribution: Spreads keys evenly across the table
Efficient: Computes quickly (O(1) for fixed-size keys)
Minimizes collisions: Different keys should produce different indices as much as possible

Language Essentials & Syntax

EASY

Pattern-Based Logic Building

EASY

Standard Libraries & Collections

EASY

Essential Mathematics

EASY

MEDIUM

Recursion Fundamentals

EASY

Hashing Essentials

EASY

MEDIUM

Comparison-Based Sorts

EASY

Divide & Conquer Sorts

EASY

MEDIUM

Array Fundamentals

EASY

MEDIUM

Intermediate Array Problems

EASY

MEDIUM

Advanced Array Challenges

EASY

MEDIUM

HARD

Searching in Sorted Arrays

EASY

MEDIUM

Search Space Reduction

EASY

MEDIUM

HARD

Matrix Binary Search

EASY

MEDIUM

HARD

Core String Operations

EASY

MEDIUM

Intermediate String Challenges

EASY

MEDIUM

Singly Linked List Basics

EASY

MEDIUM

Doubly Linked List Basics

EASY

MEDIUM

Linked List Techniques

EASY

MEDIUM

Doubly Linked List Techniques

MEDIUM

HARD

Advanced Linked List Challenges

MEDIUM

HARD

Building Recursive Thinking

EASY

MEDIUM

Generating Subsequences

EASY

MEDIUM

Exhaustive Search & Constraint Solving

MEDIUM

HARD

Bitwise Operations Fundamentals

EASY

MEDIUM

Bit Manipulation Applications

EASY

MEDIUM

Number Theory & Primes

EASY

MEDIUM

HARD