Getting Ready: Concurrent HashMap

Why This Problem Matters

Hash maps are the most frequently used data structure in software engineering — caches, routing tables, session stores, connection pools, configuration registries, and symbol tables all rely on O(1) key-value lookups. But the moment two threads access a shared hash map simultaneously, everything breaks.

Consider a web server that caches user sessions in a HashMap. Thread A inserts a new session while Thread B iterates over sessions to expire stale ones. Without synchronization, Thread B might see a partially constructed entry — one where the key is written but the value pointer still points to garbage memory. Worse, if the map triggers a resize during Thread A's insert, the internal bucket array is being rebuilt while Thread B reads from it, leading to infinite loops, lost entries, or segmentation faults. This isn't hypothetical: Java's HashMap documentation explicitly warns that concurrent structural modification produces undefined behavior.

The naive fix — wrapping every operation in a single global lock (synchronized(map) in Java or with lock: in Python) — works for correctness but kills performance. Every thread contends for the same lock, serializing all operations. On a 16-core server handling 50,000 requests per second, a globally locked map becomes a bottleneck where only one core does useful work while 15 cores spin waiting.

This is exactly why Java introduced ConcurrentHashMap in JDK 1.5 (2004), redesigned it completely in JDK 8 (2014), and why languages like Go provide sync.Map, C++ added concurrent_unordered_map in TBB, and Python's concurrent.futures module builds on thread-safe collections. The fundamental challenge — high-throughput concurrent access to a shared hash table — appears in every backend system at scale.

In this design problem, we'll build a Concurrent HashMap from scratch using two different strategies:

Segment Locking — divide the hash table into independently lockable segments (the approach Java used in JDK 5-7)
CAS Operations — use lock-free compare-and-swap to achieve concurrency without any locks (the approach Java uses in JDK 8+)

This problem is a favorite in concurrency interviews at companies like Google, Amazon, Oracle, and any organization building high-performance server infrastructure. It tests your understanding of fine-grained locking, lock-free data structures, memory visibility, and the fundamental trade-offs between correctness, performance, and complexity.

What You'll Learn

In this design problem, you will:

Understand why standard hash maps are thread-unsafe — identify the specific race conditions in put, get, remove, and resize operations
Design a segment-locked hash map where the table is divided into independently lockable stripes, allowing true parallel access to different segments
Design a CAS-based hash map using atomic compare-and-swap operations to achieve lock-free reads and fine-grained writes
Implement both approaches in Python and Java — seeing how threading.Lock and ReentrantLock map to segment locking, and how AtomicReference and ctypes enable CAS
Handle concurrent resizing (rehashing) — the hardest part of any concurrent hash map, where the internal array must grow while other threads continue reading and writing
Benchmark both strategies against a globally locked baseline and a lock-free baseline to understand real-world performance characteristics
Create a UML class diagram showing the structural relationships between ConcurrentHashMap, Segment, HashEntry, and the CAS node types
Apply concurrency patterns from Fundamentals S14 and S15 — specifically the Striped Lock pattern and lock-free programming with CAS

Prerequisites

Before starting this problem, make sure you're comfortable with:

Hash Table Internals — how hash functions map keys to bucket indices, what hash collisions are, how chaining (linked lists per bucket) and open addressing resolve them, and when/why tables resize (load factor). If hashCode() % capacity doesn't feel intuitive, review hash table basics first.
Threads and Mutual Exclusion — creating threads, acquiring/releasing locks, the concept of critical sections and race conditions (Fundamentals S14). Every operation on our concurrent map must reason about "what if another thread is doing this at the same time?"
Condition Variables and Signaling — how wait() and notify() coordinate threads without busy-waiting (Fundamentals S14). Needed for blocking operations during resize.
CAS (Compare-And-Swap) — the hardware primitive that enables lock-free programming. CAS(address, expected, new) atomically sets address to new only if it currently equals expected. This is the building block of our lock-free implementation (Fundamentals S14).
Thread-Safe Blocking Queue (DP33) — if you completed it, you already understand producer-consumer synchronization and lock-based thread coordination. The Concurrent HashMap extends these concepts to a more complex data structure.
Thread Pool (DP34) — understanding how multiple worker threads interact with shared state gives you the mental model for reasoning about concurrent map access patterns.

Problem Scope

We will design: A thread-safe hash map supporting concurrent put(key, value), get(key), remove(key), and size() operations. We'll build two complete implementations — one using segment locking (coarse-grained stripes) and one using CAS operations (lock-free reads, fine-grained CAS writes). Both handle concurrent resizing.

We will NOT design: Distributed hash maps (consistent hashing, partitioning across nodes), persistent/disk-backed maps, ordered maps (TreeMap equivalents), or cache eviction policies (LRU, LFU). These are system-level concerns beyond the scope of this object-oriented design exercise.

Key simplifications:

Separate chaining for collision resolution (linked list per bucket) — simpler than open addressing for concurrent access
Fixed load factor threshold (0.75) to trigger resizing
No iterators — concurrent iteration adds significant complexity (weakly consistent iterators are a separate design challenge)
No putIfAbsent / computeIfAbsent compound operations — though we'll discuss them as extensions

Why two implementation tutorials? Segment locking and CAS represent fundamentally different concurrency paradigms. Segment locking is easier to understand and reason about — it's traditional lock-based programming applied at a finer granularity. CAS is harder but offers better performance under high contention. Seeing both gives you the vocabulary to discuss concurrency trade-offs in any interview.

How Real Concurrent Hash Maps Work

Before we design our own, let's understand the evolution of real concurrent hash maps — this history directly motivates our two implementation strategies.

Java's ConcurrentHashMap — Two Eras:

JDK 5-7 (Segment Locking): The original ConcurrentHashMap by Doug Lea divided the table into 16 segments, each independently lockable. A put("alice", 42) call hashes "alice" to determine which segment owns that key, then locks only that segment. Meanwhile, a put("bob", 99) that hashes to a different segment proceeds in parallel with zero contention. Reads (get) were lock-free using volatile fields — they never blocked. The number of segments (called the concurrency level) was configurable at construction.

JDK 8+ (CAS + Synchronized): The redesign eliminated segments entirely. Each bucket in the array is an independent entry point. Reads use volatile array reads — completely lock-free. Writes use CAS to atomically set empty buckets and synchronized on the first node of a non-empty bucket. Resizing uses a clever cooperative strategy: when one thread starts resizing, other threads that access the table help transfer buckets instead of blocking. This reduced memory overhead (no segment objects) and improved scalability beyond 16-way parallelism.

Python's Concurrency Story:

Python's GIL (Global Interpreter Lock) makes standard dict operations atomic at the bytecode level — a single dict[key] = value won't corrupt the internal structure. But compound operations (check-then-act like if key not in d: d[key] = value) are NOT atomic. For I/O-bound servers using threads, a thread-safe dict with explicit locking is still necessary. Python's concurrent.futures and frameworks like Django use locked dictionaries extensively.

Go's sync.Map:

Go provides sync.Map optimized for two patterns: (1) keys are written once and read many times, (2) disjoint sets of keys are written by different goroutines. Internally, it uses a read-only atomic map for fast reads and a dirty map for writes, promoting dirty to read-only after a threshold.

The common thread: Every production implementation faces the same core tension — reads should be as fast as possible (ideally lock-free), writes need synchronization, and resizing must not block the entire map. Our two tutorials systematically explore both sides of this trade-off.