Union of Two Arrays
Description
This problem is about the Disjoint Set Union (DSU) data structure, also known as Union-Find.
You are given N nodes numbered from 1 to N. Initially, every node belongs to its own separate set (no nodes are connected). You need to implement two operations:
-
union_(x, y): Merge the set containing node
xwith the set containing nodey. After this operation,xandy(and all nodes in their respective sets) belong to the same set. -
isConnected(x, y): Determine whether nodes
xandybelong to the same set. Return1if they are in the same set, and0otherwise.
You are given two arrays par[] and rank1[] initialized as par[i] = i and rank1[i] = 1 for all i. Use these arrays to efficiently manage the disjoint sets.
Examples
Example 1
Input: N = 5, Queries: Union(1,3), isConnected(1,2), Union(1,5), isConnected(3,5)
Output: 0 1
Explanation:
- Initially, each node is its own set: {1}, {2}, {3}, {4}, {5}.
- Union(1,3): Merge sets containing 1 and 3. Now: {1,3}, {2}, {4}, {5}.
- isConnected(1,2): Nodes 1 and 2 are in different sets. Return 0.
- Union(1,5): Merge sets containing 1 and 5. Node 1 is in set {1,3}, so now: {1,3,5}, {2}, {4}.
- isConnected(3,5): Both 3 and 5 are in set {1,3,5}. Return 1.
Constraints
- 1 ≤ N ≤ 10^5
- 1 ≤ Q ≤ 10^5 (number of union/isConnected queries)
- 1 ≤ x, y ≤ N
Expected Complexity:
- Time: O(N + Q) (nearly O(α(N)) per operation with optimizations)
- Space: O(1) auxiliary (the par[] and rank1[] arrays are provided)
Editorial
Brute Force - Naive Union-Find
Intuition
The core idea behind Union-Find is to represent each set as a tree, where every node has a parent pointer. The root of a tree is the representative (or "leader") of that set. Two nodes belong to the same set if and only if they have the same root.
Initially, each node is its own root — it points to itself: par[i] = i. This means every node starts in a set containing just itself.
Find operation: To determine which set a node belongs to, we follow parent pointers upward from that node until we reach a node that is its own parent — that's the root. The root uniquely identifies the set.
Union operation: To merge two sets, we find the root of each set. If the roots are different, we make one root point to the other, effectively merging the two trees into one.
isConnected: Simply check if find(x) == find(y) — if both nodes have the same root, they are in the same set.
The simplest version of this is the naive approach: during union, we always attach the root of the second node under the root of the first node, regardless of tree size. This works correctly but can create tall, skinny trees (essentially linked lists), making the find operation slow — up to O(n) per call in the worst case.
Step-by-Step Explanation
Let's trace through the example: N = 5, queries: Union(1,3), isConnected(1,2), Union(1,5), isConnected(3,5).
Initialization: par = [_, 1, 2, 3, 4, 5] (1-indexed). Every node is its own parent.
Sets: {1}, {2}, {3}, {4}, {5}
Query 1: Union(1, 3)
- Find root of 1: par[1] = 1 → root is 1.
- Find root of 3: par[3] = 3 → root is 3.
- Roots differ (1 ≠ 3), so merge: set par[3] = 1. Now node 3's parent is 1.
- Sets: {1, 3}, {2}, {4}, {5}
Query 2: isConnected(1, 2)
- Find root of 1: par[1] = 1 → root is 1.
- Find root of 2: par[2] = 2 → root is 2.
- Roots differ (1 ≠ 2). Return 0.
Query 3: Union(1, 5)
- Find root of 1: par[1] = 1 → root is 1.
- Find root of 5: par[5] = 5 → root is 5.
- Roots differ (1 ≠ 5), so merge: set par[5] = 1.
- Sets: {1, 3, 5}, {2}, {4}
Query 4: isConnected(3, 5)
- Find root of 3: par[3] = 1, par[1] = 1 → root is 1.
- Find root of 5: par[5] = 1, par[1] = 1 → root is 1.
- Roots are the same (1 == 1). Return 1.
Naive Union-Find — Building the Parent Array — Watch how the parent array evolves as union operations merge sets, and how find operations traverse parent chains to locate roots.
Algorithm
Find(x):
- While par[x] ≠ x, set x = par[x]
- Return x (the root)
Union(x, y):
- Find root_x = Find(x)
- Find root_y = Find(y)
- If root_x ≠ root_y, set par[root_y] = root_x
isConnected(x, y):
- Return 1 if Find(x) == Find(y), else return 0
Code
// Naive Union-Find (without optimizations)
int find(int par[], int x) {
while (par[x] != x) {
x = par[x];
}
return x;
}
void union_(int par[], int rank1[], int x, int y) {
int root_x = find(par, x);
int root_y = find(par, y);
if (root_x != root_y) {
par[root_y] = root_x;
}
}
int isConnected(int par[], int rank1[], int x, int y) {
return find(par, x) == find(par, y) ? 1 : 0;
}# Naive Union-Find (without optimizations)
def find(par, x):
while par[x] != x:
x = par[x]
return x
def union_(par, rank1, x, y):
root_x = find(par, x)
root_y = find(par, y)
if root_x != root_y:
par[root_y] = root_x
def isConnected(par, rank1, x, y):
return 1 if find(par, x) == find(par, y) else 0// Naive Union-Find (without optimizations)
int find(int[] par, int x) {
while (par[x] != x) {
x = par[x];
}
return x;
}
void union_(int[] par, int[] rank1, int x, int y) {
int rootX = find(par, x);
int rootY = find(par, y);
if (rootX != rootY) {
par[rootY] = rootX;
}
}
int isConnected(int[] par, int[] rank1, int x, int y) {
return find(par, x) == find(par, y) ? 1 : 0;
}Complexity Analysis
Time Complexity: O(n) per find operation in the worst case, O(n × Q) total
In the worst case, the tree degenerates into a straight chain (like a linked list). For example, if we union nodes 1→2, then 2→3, then 3→4, the tree becomes a chain of depth n. Each find operation must traverse the entire chain from a leaf to the root — O(n) per call. With Q queries, total time is O(n × Q).
For n and Q up to 10^5, this could mean 10^10 operations — far too slow.
Space Complexity: O(1) auxiliary
We only use the provided par[] and rank1[] arrays. No additional data structures are created.
Why This Approach Is Not Efficient
The naive approach has a critical flaw: tree height is unbounded. When we always attach one root under another without considering tree sizes, we can create arbitrarily tall trees.
Consider this sequence of unions: Union(1,2), Union(2,3), Union(3,4), Union(4,5). If each time the second root is attached under the first:
- After Union(1,2): 2→1
- After Union(2,3): To union 2 and 3, we find root of 2 (which is 1), find root of 3 (which is 3), and set par[3] = 1.
- But in a degenerate case, we could build a chain: 5→4→3→2→1
Finding the root of node 5 now requires traversing 4 edges — the entire depth of the tree. With n nodes, this is O(n) per find.
Two optimizations fix this:
- Union by Rank: Always attach the shorter tree under the taller tree. This limits the maximum height to O(log n).
- Path Compression: During find, make every node on the path point directly to the root. This flattens the tree for future queries.
Together, these optimizations reduce the amortized time per operation to O(α(n)), where α is the inverse Ackermann function — effectively constant for all practical inputs.
Optimal Approach - Union by Rank with Path Compression
Intuition
The optimal approach adds two powerful optimizations to the basic Union-Find structure:
1. Union by Rank:
Each tree maintains a rank — an upper bound on its height. When merging two trees, we attach the tree with smaller rank under the tree with larger rank. This ensures the resulting tree stays as flat as possible.
- If both trees have the same rank, we pick one as the new root and increment its rank by 1.
- If ranks differ, we attach the smaller-rank tree under the larger-rank tree. The rank of the result does not change.
This guarantees the tree height is at most O(log n), improving find from O(n) to O(log n) per call.
2. Path Compression:
Every time we call find(x), after locating the root, we update the parent of every node on the path from x to the root to point directly to the root. This dramatically flattens the tree, so future find calls on these nodes are O(1).
With both optimizations combined, the amortized time per union or find operation becomes O(α(n)), where α(n) is the inverse Ackermann function. For all practical values of n (up to 10^80), α(n) ≤ 4. This is effectively constant time per operation.
The rank array is already provided as rank1[] initialized to 1. We will treat rank as the tree size (union by size variant) which achieves the same bounds.
Step-by-Step Explanation
Let's trace the same example with Union by Rank + Path Compression:
N = 5, queries: Union(1,3), isConnected(1,2), Union(1,5), isConnected(3,5)
Initialization:
- par = [_, 1, 2, 3, 4, 5]
- rank1 = [_, 1, 1, 1, 1, 1]
Query 1: Union(1, 3)
- find(1): par[1]=1 → root=1 (already at root, no path to compress).
- find(3): par[3]=3 → root=3.
- root_x=1, root_y=3. Ranks: rank1[1]=1, rank1[3]=1. Equal ranks!
- Attach 3 under 1: par[3]=1, rank1[1]=2.
- par = [, 1, 2, 1, 4, 5], rank1 = [, 2, 1, 1, 1, 1]
Query 2: isConnected(1, 2)
- find(1): root=1. find(2): root=2.
- 1 ≠ 2 → Return 0.
Query 3: Union(1, 5)
- find(1): root=1. find(5): root=5.
- rank1[1]=2 > rank1[5]=1, so attach 5 under 1: par[5]=1.
- rank stays the same since rank of root 1 is already larger.
- par = [, 1, 2, 1, 4, 1], rank1 = [, 2, 1, 1, 1, 1]
Query 4: isConnected(3, 5)
- find(3): par[3]=1, par[1]=1 → root=1. Path compression: par[3] already points to 1, no change needed.
- find(5): par[5]=1, par[1]=1 → root=1.
- 1 == 1 → Return 1.
Output: 0 1
Union by Rank + Path Compression — Keeping Trees Flat — Watch how union by rank prevents tall trees by attaching shorter trees under taller ones, and how path compression flattens the structure during find operations.
Algorithm
Find(par, x) with Path Compression:
- If par[x] ≠ x:
- Recursively set par[x] = Find(par, par[x])
- Return par[x]
(This makes every node on the find path point directly to the root.)
Union(par, rank1, x, y) with Union by Rank:
- root_x = Find(par, x)
- root_y = Find(par, y)
- If root_x == root_y, do nothing (already in same set)
- If rank1[root_x] < rank1[root_y], set par[root_x] = root_y
- Else if rank1[root_x] > rank1[root_y], set par[root_y] = root_x
- Else (equal ranks): set par[root_y] = root_x, increment rank1[root_x] by 1
isConnected(par, rank1, x, y):
- Return 1 if Find(par, x) == Find(par, y), else return 0
Code
// Union by Rank + Path Compression
int find(int par[], int x) {
if (par[x] != x) {
par[x] = find(par, par[x]); // path compression
}
return par[x];
}
void union_(int par[], int rank1[], int x, int y) {
int root_x = find(par, x);
int root_y = find(par, y);
if (root_x == root_y) return;
// union by rank
if (rank1[root_x] < rank1[root_y]) {
par[root_x] = root_y;
} else if (rank1[root_x] > rank1[root_y]) {
par[root_y] = root_x;
} else {
par[root_y] = root_x;
rank1[root_x]++;
}
}
int isConnected(int par[], int rank1[], int x, int y) {
return find(par, x) == find(par, y) ? 1 : 0;
}# Union by Rank + Path Compression
def find(par, x):
if par[x] != x:
par[x] = find(par, par[x]) # path compression
return par[x]
def union_(par, rank1, x, y):
root_x = find(par, x)
root_y = find(par, y)
if root_x == root_y:
return
# union by rank
if rank1[root_x] < rank1[root_y]:
par[root_x] = root_y
elif rank1[root_x] > rank1[root_y]:
par[root_y] = root_x
else:
par[root_y] = root_x
rank1[root_x] += 1
def isConnected(par, rank1, x, y):
return 1 if find(par, x) == find(par, y) else 0// Union by Rank + Path Compression
int find(int[] par, int x) {
if (par[x] != x) {
par[x] = find(par, par[x]); // path compression
}
return par[x];
}
void union_(int[] par, int[] rank1, int x, int y) {
int rootX = find(par, x);
int rootY = find(par, y);
if (rootX == rootY) return;
// union by rank
if (rank1[rootX] < rank1[rootY]) {
par[rootX] = rootY;
} else if (rank1[rootX] > rank1[rootY]) {
par[rootY] = rootX;
} else {
par[rootY] = rootX;
rank1[rootX]++;
}
}
int isConnected(int[] par, int[] rank1, int x, int y) {
return find(par, x) == find(par, y) ? 1 : 0;
}Complexity Analysis
Time Complexity: O((N + Q) × α(N)) ≈ O(N + Q)
With both Union by Rank and Path Compression, each union and find operation runs in amortized O(α(N)) time, where α is the inverse Ackermann function. The inverse Ackermann function grows incredibly slowly — for any practical input size (up to 10^80), α(N) ≤ 4.
For N and Q up to 10^5, the total time is effectively O(N + Q) — a massive improvement over the naive approach's O(N × Q).
- Union by Rank alone gives O(log N) per operation (tree height bounded by log N).
- Path Compression alone gives amortized O(log N) per operation.
- Both together give amortized O(α(N)) — nearly constant.
Space Complexity: O(1) auxiliary
We use only the provided par[] and rank1[] arrays. The recursive path compression uses O(log N) stack space in the worst case, but this can be converted to an iterative two-pass approach for strict O(1) space.