Skip to main content

Find the Duplicate Number

MEDIUMProblemSolveExternal Links

Description

Given an array of integers nums containing n + 1 integers where each integer is in the range [1, n] inclusive, there is only one repeated number in nums. Return this repeated number.

You must solve the problem without modifying the array nums and using only constant extra space O(1).

The key insight behind this problem is the Pigeonhole Principle: if you have n+1 items to place into n containers, at least one container must hold more than one item. Here, n+1 numbers are drawn from only n possible values (1 through n), so at least one value must repeat.

Examples

Example 1

Input: nums = [1, 3, 4, 2, 2]

Output: 2

Explanation: The value 2 appears at indices 3 and 4. Every other value (1, 3, 4) appears exactly once. So 2 is the duplicate.

Example 2

Input: nums = [3, 1, 3, 4, 2]

Output: 3

Explanation: The value 3 appears at indices 0 and 2. All other values appear once.

Example 3

Input: nums = [3, 3, 3, 3, 3]

Output: 3

Explanation: The duplicate can appear more than twice. Here, 3 appears five times. The answer is still 3.

Constraints

  • 1 ≤ n ≤ 10^5
  • nums.length == n + 1
  • 1 ≤ nums[i] ≤ n
  • All integers in nums appear only once except for precisely one integer which appears two or more times
  • You must not modify the array
  • You must use only constant extra space O(1)

Editorial

Brute Force - Check All Pairs

Intuition

The most straightforward idea: if there's a duplicate, it means some value appears at two (or more) different positions. So we can check every possible pair of positions and see if the values match.

Think of it like a classroom roll call where one student's name was written twice on the attendance sheet. The brute force way to find the duplicate is to pick each name and compare it against every name that comes after it. Eventually, you'll find two entries that match.

This approach respects both constraints — we don't modify the array and we use O(1) extra space. But it's slow.

Step-by-Step Explanation

Let's trace with nums = [1, 3, 4, 2, 2]:

Step 1: Fix i=0, element is 1. Compare against all elements after it.

Step 2: Compare (i=0, j=1): nums[0]=1, nums[1]=3. 1 ≠ 3.

Step 3: Compare (i=0, j=4): nums[0]=1, nums[4]=2. 1 ≠ 2. All pairs with i=0 checked — no match found. (j=2 and j=3 also don't match.)

Step 4: Fix i=1, element is 3. Compare (i=1, j=2): 3 ≠ 4.

Step 5: Compare (i=1, j=4): 3 ≠ 2. All pairs with i=1 checked — no match.

Step 6: Fix i=2, compare (i=2, j=3): 4 ≠ 2. And (i=2, j=4): 4 ≠ 2. No match.

Step 7: Fix i=3, element is 2. Compare (i=3, j=4): nums[3]=2, nums[4]=2. 2 == 2! MATCH!

Step 8: Return 2.

Brute Force — Checking All Pairs for Duplicates — Nested loops compare every pair. Watch how many comparisons are needed before finding the duplicate at the end.

Algorithm

  1. For each index i from 0 to n-1:
    • For each index j from i+1 to n:
      • If nums[i] == nums[j], return nums[i]
  2. Return -1 (unreachable given constraints)

Code

class Solution {
public:
    int findDuplicate(vector<int>& nums) {
        int n = nums.size();
        for (int i = 0; i < n; i++) {
            for (int j = i + 1; j < n; j++) {
                if (nums[i] == nums[j]) {
                    return nums[i];
                }
            }
        }
        return -1;
    }
};
class Solution:
    def findDuplicate(self, nums: list[int]) -> int:
        n = len(nums)
        for i in range(n):
            for j in range(i + 1, n):
                if nums[i] == nums[j]:
                    return nums[i]
        return -1
class Solution {
    public int findDuplicate(int[] nums) {
        int n = nums.length;
        for (int i = 0; i < n; i++) {
            for (int j = i + 1; j < n; j++) {
                if (nums[i] == nums[j]) {
                    return nums[i];
                }
            }
        }
        return -1;
    }
}

Complexity Analysis

Time Complexity: O(n²)

The outer loop runs n+1 times, and for each iteration the inner loop runs up to n times. Total pair checks: (n+1)*n/2, which simplifies to O(n²).

Space Complexity: O(1)

We only use a few integer variables (i, j). No additional data structures. This satisfies the constant-space constraint.

Why This Approach Is Not Efficient

With n up to 10^5, the O(n²) approach performs roughly 5 × 10^9 operations — far too slow for typical time limits of 1-2 seconds.

A natural thought is to use a HashSet: scan the array once, and for each element, check if it's already in the set. If yes, that's the duplicate. This gives O(n) time but uses O(n) space — violating the constant-space constraint.

So we need something smarter. The key insight: instead of searching for the duplicate among elements, we can binary search on the value range [1, n]. For a candidate value mid, we count how many elements are ≤ mid. By the Pigeonhole Principle, if the count exceeds mid, the duplicate must be in the range [1, mid]. Otherwise, it's in [mid+1, n]. This gives us O(n log n) time with O(1) space.

Better Approach - Binary Search on Value Range

Intuition

Instead of searching for the duplicate by comparing elements directly, we ask a different question: "Is the duplicate in the range [1, mid] or [mid+1, n]?"

Here's the reasoning. In an array of n+1 numbers drawn from [1, n] with no duplicates, how many numbers should be ≤ mid? Exactly mid numbers (the values 1, 2, ..., mid). But if the duplicate is ≤ mid, then the count of numbers ≤ mid will be greater than mid — there's one extra.

This is binary search, but not on the array indices — it's on the answer space [1, n]. At each step, we count elements in the lower half vs upper half to decide which half contains the duplicate. We keep halving until only one value remains.

Think of it like a detective narrowing down suspects. Instead of interrogating everyone individually, you split the group in half and count — if one half has too many people, the imposter is hiding there.

Step-by-Step Explanation

Let's trace with nums = [1, 3, 4, 2, 2], n = 4:

Step 1: Search range is [1, 4]. Compute mid = (1+4)/2 = 2.

Step 2: Count elements ≤ 2 in the array: 1, 2, 2 → count = 3.

Step 3: Is count (3) > mid (2)? YES. By Pigeonhole, the duplicate must be in [1, 2]. Set high = 2.

Step 4: Search range is now [1, 2]. Compute mid = (1+2)/2 = 1.

Step 5: Count elements ≤ 1: only 1 → count = 1.

Step 6: Is count (1) > mid (1)? NO. The duplicate is NOT in [1, 1]. Set low = 2.

Step 7: low == high == 2. The search range has collapsed to a single value.

Step 8: Return 2. The duplicate is 2.

Binary Search on Value Range — Pigeonhole Counting — Watch how we binary search on the answer range [1, n], counting elements at each step to narrow down which half contains the duplicate.

Algorithm

  1. Set low = 1, high = n (the value range)
  2. While low < high:
    a. Compute mid = low + (high - low) / 2
    b. Count how many elements in nums are ≤ mid
    c. If count > mid, the duplicate is in [low, mid]: set high = mid
    d. Else, the duplicate is in [mid+1, high]: set low = mid + 1
  3. Return low (or high — they're equal)

Code

class Solution {
public:
    int findDuplicate(vector<int>& nums) {
        int low = 1, high = nums.size() - 1;
        while (low < high) {
            int mid = low + (high - low) / 2;
            int count = 0;
            for (int num : nums) {
                if (num <= mid) count++;
            }
            if (count > mid) {
                high = mid;
            } else {
                low = mid + 1;
            }
        }
        return low;
    }
};
class Solution:
    def findDuplicate(self, nums: list[int]) -> int:
        low, high = 1, len(nums) - 1
        while low < high:
            mid = low + (high - low) // 2
            count = sum(1 for num in nums if num <= mid)
            if count > mid:
                high = mid
            else:
                low = mid + 1
        return low
class Solution {
    public int findDuplicate(int[] nums) {
        int low = 1, high = nums.length - 1;
        while (low < high) {
            int mid = low + (high - low) / 2;
            int count = 0;
            for (int num : nums) {
                if (num <= mid) count++;
            }
            if (count > mid) {
                high = mid;
            } else {
                low = mid + 1;
            }
        }
        return low;
    }
}

Complexity Analysis

Time Complexity: O(n log n)

The binary search runs O(log n) iterations (the value range [1, n] is halved each time). Each iteration scans the entire array of n+1 elements to count values ≤ mid. Total: O(n) per iteration × O(log n) iterations = O(n log n).

Space Complexity: O(1)

We use only a few integer variables (low, high, mid, count). No additional data structures. This satisfies the constant-space constraint.

Why This Approach Is Not Efficient

The binary search approach is a significant improvement over brute force — O(n log n) vs O(n²) — and it respects both constraints (no modification, O(1) space). But can we do even better?

The inefficiency lies in the repeated full-array scans. Each binary search iteration re-scans all n+1 elements to count values ≤ mid. We're doing O(log n) complete passes over the data.

The breakthrough insight comes from a completely different angle: treat the array as a linked list. Since every element is in range [1, n] and the array has n+1 slots (indexed 0 to n), we can interpret nums[i] as a "pointer" from position i to position nums[i]. Because there's a duplicate value (say value d), two different positions both point to position d — creating a cycle in this implicit linked list.

Finding the duplicate becomes equivalent to finding the entrance to a cycle in a linked list — a classic problem solvable with Floyd's Tortoise and Hare algorithm in O(n) time and O(1) space!

Optimal Approach - Floyd's Tortoise and Hare (Cycle Detection)

Intuition

This is one of the most elegant algorithm applications in competitive programming. The idea rests on a simple observation:

Array as a linked list: If we define f(x) = nums[x], then following the chain 0 → f(0) → f(f(0)) → ... creates a sequence that eventually enters a cycle. Why? Because there are n+1 positions but only n distinct target values (1 through n). The duplicate value d has two or more arrows pointing to it (from two or more positions), which creates the cycle's entrance.

For nums = [1, 3, 4, 2, 2]:

0 → 1 → 3 → 2 → 4 → 2 → 4 → 2 → ...

The cycle is 2 → 4 → 2 → 4 → ... and the entrance is at value 2 — the duplicate!

Floyd's algorithm finds this entrance in two phases:

  1. Phase 1 (Detect cycle): A slow pointer (one step at a time) and a fast pointer (two steps) both start at position 0. If there's a cycle, they'll eventually meet inside it.
  2. Phase 2 (Find entrance): Start a new pointer at position 0, keep the slow pointer where it met fast. Move both one step at a time. Where they meet is the cycle entrance — the duplicate number.

The mathematical proof of why phase 2 works relies on the fact that the distance from the start to the cycle entrance equals the distance from the meeting point to the cycle entrance (modulo cycle length).

Step-by-Step Explanation

Let's trace with nums = [1, 3, 4, 2, 2]:

First, visualize the implicit linked list:

Index: 0 → 1 → 3 → 2 → 4 ↩ (back to 2)
        ↓   ↓   ↓   ↓   ↓
Value:  1   3   2   4   2

Chain: 0 → 1 → 3 → 2 → 4 → 2 → 4 → ... (cycle: 2 ↔ 4)

Phase 1: Find meeting point

Step 1: Initialize. slow = nums[0] = 1, fast = nums[nums[0]] = nums[1] = 3.

Step 2: slow ≠ fast (1 ≠ 3). Move: slow = nums[1] = 3, fast = nums[nums[3]] = nums[2] = 4.

Step 3: slow ≠ fast (3 ≠ 4). Move: slow = nums[3] = 2, fast = nums[nums[4]] = nums[2] = 4.

Step 4: slow ≠ fast (2 ≠ 4). Move: slow = nums[2] = 4, fast = nums[nums[4]] = nums[2] = 4.

Step 5: slow == fast == 4. Meeting point found at value 4! Both are inside the cycle.

Phase 2: Find cycle entrance

Step 6: Start slow2 = 0. Keep slow at 4. Both move one step at a time.

Step 7: slow2 = nums[0] = 1, slow = nums[4] = 2. Not equal (1 ≠ 2).

Step 8: slow2 = nums[1] = 3, slow = nums[2] = 4. Not equal (3 ≠ 4).

Step 9: slow2 = nums[3] = 2, slow = nums[4] = 2. slow2 == slow == 2!

Step 10: The cycle entrance is at value 2. Return 2 — that's the duplicate!

Floyd's Cycle Detection — Array as Linked List — The array forms a linked list: node i points to nums[i]. A duplicate creates a cycle. Watch slow/fast pointers detect the cycle, then a second phase finds the entrance — the duplicate number.

Algorithm

  1. Phase 1 — Find meeting point:

    • Initialize slow = nums[0], fast = nums[nums[0]]
    • While slow ≠ fast:
      • slow = nums[slow] (one step)
      • fast = nums[nums[fast]] (two steps)
    • When they meet, a cycle is confirmed
  2. Phase 2 — Find cycle entrance:

    • Initialize slow2 = 0
    • While slow2 ≠ slow:
      • slow2 = nums[slow2] (one step)
      • slow = nums[slow] (one step)
    • When they meet, the meeting value is the duplicate
  3. Return slow2 (or slow — they're equal)

Code

class Solution {
public:
    int findDuplicate(vector<int>& nums) {
        // Phase 1: Find intersection point
        int slow = nums[0];
        int fast = nums[nums[0]];
        while (slow != fast) {
            slow = nums[slow];
            fast = nums[nums[fast]];
        }

        // Phase 2: Find entrance to cycle
        int slow2 = 0;
        while (slow2 != slow) {
            slow2 = nums[slow2];
            slow = nums[slow];
        }
        return slow2;
    }
};
class Solution:
    def findDuplicate(self, nums: list[int]) -> int:
        # Phase 1: Find intersection point
        slow = nums[0]
        fast = nums[nums[0]]
        while slow != fast:
            slow = nums[slow]
            fast = nums[nums[fast]]

        # Phase 2: Find entrance to cycle
        slow2 = 0
        while slow2 != slow:
            slow2 = nums[slow2]
            slow = nums[slow]
        return slow2
class Solution {
    public int findDuplicate(int[] nums) {
        // Phase 1: Find intersection point
        int slow = nums[0];
        int fast = nums[nums[0]];
        while (slow != fast) {
            slow = nums[slow];
            fast = nums[nums[fast]];
        }

        // Phase 2: Find entrance to cycle
        int slow2 = 0;
        while (slow2 != slow) {
            slow2 = nums[slow2];
            slow = nums[slow];
        }
        return slow2;
    }
}

Complexity Analysis

Time Complexity: O(n)

Phase 1: The slow pointer moves at most n+1 steps before the meeting (it can visit each position at most once before entering the cycle, then at most one full cycle). The fast pointer moves twice as many steps. Total phase 1: O(n).

Phase 2: Both pointers move at most n steps before converging at the cycle entrance. Total phase 2: O(n).

Overall: O(n) + O(n) = O(n).

Space Complexity: O(1)

We use only three integer variables (slow, fast, slow2). No arrays, no hash sets, no modifications to the input. This is truly constant space.

Comparison of all three approaches:

ApproachTimeSpaceModifies Array?
Brute Force (All Pairs)O(n²)O(1)No
Binary Search on ValueO(n log n)O(1)No
Floyd's Cycle DetectionO(n)O(1)No