Design HashSet
Description
Design a HashSet data structure from scratch without using any built-in hash table libraries.
A HashSet is a collection that stores unique elements and supports three core operations:
- add(key): Insert the value
keyinto the set. If the key already exists, the set remains unchanged. - contains(key): Check whether the value
keyexists in the set. Returntrueif it does,falseotherwise. - remove(key): Delete the value
keyfrom the set. If the key does not exist, do nothing.
Your task is to implement the MyHashSet class with these three methods, handling all operations correctly and efficiently.
Examples
Example 1
Input:
["MyHashSet", "add", "add", "contains", "contains", "add", "contains", "remove", "contains"]
[[], [1], [2], [1], [3], [2], [2], [2], [2]]
Output:
[null, null, null, true, false, null, true, null, false]
Explanation:
MyHashSet myHashSet = new MyHashSet();— Create an empty set.myHashSet.add(1);— The set becomes {1}.myHashSet.add(2);— The set becomes {1, 2}.myHashSet.contains(1);— Returnstruebecause 1 is in the set.myHashSet.contains(3);— Returnsfalsebecause 3 is not in the set.myHashSet.add(2);— The set remains {1, 2} since 2 already exists.myHashSet.contains(2);— Returnstruebecause 2 is in the set.myHashSet.remove(2);— The set becomes {1} after removing 2.myHashSet.contains(2);— Returnsfalsebecause 2 was removed.
Constraints
- 0 ≤ key ≤ 10^6
- At most 10^4 calls will be made to
add,remove, andcontains.
Editorial
Brute Force
Intuition
The simplest way to design a HashSet is to completely avoid hashing altogether. Since the problem tells us that keys range from 0 to 1,000,000, we can create a giant boolean array of size 1,000,001 where each index represents a possible key.
Think of it like a wall of light switches in a hallway, one switch for every possible number from 0 to 1,000,000. To add a number, flip its switch ON. To remove it, flip it OFF. To check if it exists, just look at whether the switch is ON or OFF.
This approach is called direct addressing — we use the key itself as the index into our storage array. It works perfectly when the key range is known and bounded, but it wastes enormous memory if only a few keys are actually stored.
Step-by-Step Explanation
Let's trace through the operations from Example 1:
Step 1: Initialize a boolean array data of size 1,000,001, all set to false.
- data = [false, false, false, false, ...] (1,000,001 slots)
Step 2: add(1) — Set data[1] = true.
- data[1] is now
true. The set logically contains {1}.
Step 3: add(2) — Set data[2] = true.
- data[2] is now
true. The set logically contains {1, 2}.
Step 4: contains(1) — Return data[1].
- data[1] is
true, so returntrue. Key 1 exists.
Step 5: contains(3) — Return data[3].
- data[3] is
false, so returnfalse. Key 3 does not exist.
Step 6: add(2) — Set data[2] = true.
- data[2] was already
true, so nothing effectively changes. Duplicates are naturally handled.
Step 7: contains(2) — Return data[2].
- data[2] is
true, so returntrue.
Step 8: remove(2) — Set data[2] = false.
- data[2] is now
false. The set logically contains {1}.
Step 9: contains(2) — Return data[2].
- data[2] is
false, so returnfalse. Key 2 was removed.
Direct Addressing — Boolean Array Operations — Watch how each add, contains, and remove operation directly accesses the boolean array at the key's index, achieving O(1) time for every operation.
Algorithm
- Initialize: Create a boolean array
dataof size 1,000,001, filled withfalse. - add(key): Set
data[key] = true. - remove(key): Set
data[key] = false. - contains(key): Return
data[key].
Code
class MyHashSet {
private:
vector<bool> data;
public:
MyHashSet() {
data.resize(1000001, false);
}
void add(int key) {
data[key] = true;
}
void remove(int key) {
data[key] = false;
}
bool contains(int key) {
return data[key];
}
};class MyHashSet:
def __init__(self):
self.data = [False] * 1000001
def add(self, key: int) -> None:
self.data[key] = True
def remove(self, key: int) -> None:
self.data[key] = False
def contains(self, key: int) -> bool:
return self.data[key]class MyHashSet {
private boolean[] data;
public MyHashSet() {
data = new boolean[1000001];
}
public void add(int key) {
data[key] = true;
}
public void remove(int key) {
data[key] = false;
}
public boolean contains(int key) {
return data[key];
}
}Complexity Analysis
Time Complexity: O(1) per operation
Each of the three operations — add, remove, and contains — performs a single array access at a known index. Array access by index is a constant-time operation, so every call runs in O(1).
Space Complexity: O(N) where N = 10^6 + 1
We allocate a boolean array of fixed size 1,000,001 regardless of how many keys are actually stored. Even if we only add 5 keys, we still consume memory for over a million slots. This is extremely wasteful when the actual number of stored keys is much smaller than the key range.
Why This Approach Is Not Efficient
While the boolean array gives us O(1) time for every operation, it has a critical flaw: space waste.
We allocate an array of 1,000,001 booleans no matter what. If the problem only requires storing 10 keys, we are wasting 99.999% of the allocated memory. In real-world systems, the key range might be billions (e.g., IP addresses, user IDs), making a direct-addressing array completely impractical.
Moreover, this approach only works because the problem guarantees keys are non-negative integers within a fixed range. It cannot handle string keys, negative numbers, or keys beyond the pre-allocated range.
The fundamental insight: a real hash set uses a hash function to map keys to a much smaller array of buckets, and handles the inevitable collisions using techniques like chaining (linked lists within each bucket). This lets us use O(K) space where K is the number of stored keys, rather than O(N) where N is the entire key range.
Optimal Approach - Hash Function with Bucket Chaining
Intuition
A proper hash set works like a library with numbered shelves. Instead of having a separate shelf for every possible book (which would require billions of shelves), we use a formula to decide which shelf each book goes on. For example, shelf number = book_id modulo total_shelves. Multiple books might end up on the same shelf — that is called a collision — but we simply stack them on the same shelf and search through the small stack when needed.
Here is the design:
- Choose a bucket count: Pick a number of buckets (e.g., 1000). A prime number is often chosen to reduce clustering, but any reasonable size works.
- Hash function:
hash(key) = key % bucket_count. This maps any key to a bucket index between 0 and bucket_count - 1. - Collision resolution via chaining: Each bucket holds a list of keys that hash to the same index. When adding, we check the list first to avoid duplicates. When searching or removing, we scan the list.
With a good bucket count, each bucket's list stays very short on average, so operations are effectively O(1) amortized while using memory proportional to the number of stored keys.

Step-by-Step Explanation
Let's trace through the Example 1 operations using bucket_count = 5 and hash(key) = key % 5:
Step 1: Initialize 5 empty buckets: [[], [], [], [], []].
Step 2: add(1) — hash(1) = 1 % 5 = 1. Bucket 1 is empty, so append 1.
- Buckets: [[], [1], [], [], []]
Step 3: add(2) — hash(2) = 2 % 5 = 2. Bucket 2 is empty, so append 2.
- Buckets: [[], [1], [2], [], []]
Step 4: contains(1) — hash(1) = 1. Scan Bucket 1: [1]. Found 1! Return true.
Step 5: contains(3) — hash(3) = 3. Scan Bucket 3: []. Empty bucket! Return false.
Step 6: add(2) — hash(2) = 2. Scan Bucket 2: [2]. Key 2 already present, do nothing.
- Buckets remain: [[], [1], [2], [], []]
Step 7: contains(2) — hash(2) = 2. Scan Bucket 2: [2]. Found 2! Return true.
Step 8: remove(2) — hash(2) = 2. Scan Bucket 2: [2]. Found 2, remove it.
- Buckets: [[], [1], [], [], []]
Step 9: contains(2) — hash(2) = 2. Scan Bucket 2: []. Empty! Return false.
Hash Set with Bucket Chaining — Full Operation Trace — Watch how the hash function maps keys to buckets using modulo, and how chaining handles collisions by storing multiple keys in the same bucket's list.
Algorithm
- Initialize: Create an array of
bucket_countempty lists (buckets). Choose a reasonable bucket count (e.g., 769 — a prime number). - Hash function: For any key, compute
bucket_index = key % bucket_count. - add(key):
- Compute the bucket index.
- Scan the bucket's list. If key is already present, do nothing.
- Otherwise, append key to the list.
- remove(key):
- Compute the bucket index.
- Scan the bucket's list. If key is found, remove it.
- If not found, do nothing.
- contains(key):
- Compute the bucket index.
- Scan the bucket's list. Return
trueif key is found,falseotherwise.
Code
class MyHashSet {
private:
static const int BUCKET_COUNT = 769;
vector<list<int>> buckets;
int hash(int key) {
return key % BUCKET_COUNT;
}
public:
MyHashSet() {
buckets.resize(BUCKET_COUNT);
}
void add(int key) {
int idx = hash(key);
for (int val : buckets[idx]) {
if (val == key) return;
}
buckets[idx].push_back(key);
}
void remove(int key) {
int idx = hash(key);
buckets[idx].remove(key);
}
bool contains(int key) {
int idx = hash(key);
for (int val : buckets[idx]) {
if (val == key) return true;
}
return false;
}
};class MyHashSet:
def __init__(self):
self.bucket_count = 769
self.buckets = [[] for _ in range(self.bucket_count)]
def _hash(self, key: int) -> int:
return key % self.bucket_count
def add(self, key: int) -> None:
idx = self._hash(key)
if key not in self.buckets[idx]:
self.buckets[idx].append(key)
def remove(self, key: int) -> None:
idx = self._hash(key)
if key in self.buckets[idx]:
self.buckets[idx].remove(key)
def contains(self, key: int) -> bool:
idx = self._hash(key)
return key in self.buckets[idx]class MyHashSet {
private static final int BUCKET_COUNT = 769;
private List<List<Integer>> buckets;
public MyHashSet() {
buckets = new ArrayList<>();
for (int i = 0; i < BUCKET_COUNT; i++) {
buckets.add(new LinkedList<>());
}
}
private int hash(int key) {
return key % BUCKET_COUNT;
}
public void add(int key) {
int idx = hash(key);
if (!buckets.get(idx).contains(key)) {
buckets.get(idx).add(key);
}
}
public void remove(int key) {
int idx = hash(key);
buckets.get(idx).remove(Integer.valueOf(key));
}
public boolean contains(int key) {
int idx = hash(key);
return buckets.get(idx).contains(key);
}
}Complexity Analysis
Time Complexity: O(N / K) per operation on average, where N is the number of stored keys and K is the bucket count.
With a well-chosen bucket count (e.g., 769) and a reasonable number of keys (up to 10^4), the average chain length per bucket is about 10^4 / 769 ≈ 13 elements. Scanning a chain of 13 elements is effectively constant. In the best case (uniform distribution), this approaches O(1). In the worst case (all keys hash to the same bucket), it degrades to O(N), but this is extremely unlikely with a prime bucket count.
Space Complexity: O(K + N) where K is the bucket count and N is the number of stored keys.
We allocate K empty bucket lists upfront, and each of the N inserted keys occupies one slot in one bucket's list. This is much more memory-efficient than the boolean array approach when N is much smaller than the key range (10^6).