Find the Duplicate Number

Given an array nums containing n+ 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.

Example 1:

Input: [1,3,4,2,2]
Output: 2

Example 2:

Input: [3,1,3,4,2]
Output: 3

Note:

  1. You must not modify the array (assume the array is read only).

  2. You must use only constant, O(1) extra space.

  3. Your runtime complexity should be less than O(_n_2).

  4. There is only one duplicate number in the array, but it could be repeated more than once.

Solution

Note

The first two approaches mentioned do not satisfy the constraints given in the prompt, but they are solutions that you might be likely to come up with during a technical interview. As an interviewer, I personally would_not_expect someone to come up with the cycle detection solution unless they have heard it before.

Proof

Proving that at least one duplicate must exist innumsis simple application of thepigeonhole principle. Here, each number innumsis a "pigeon" and each distinct number that can appear innumsis a "pigeonhole". Because there aren+1n+1numbers arenndistinct possible numbers, the pigeonhole principle implies that at least one of the numbers is duplicated.

Approach #1 Sorting [Accepted]

Intuition

If the numbers are sorted, then any duplicate numbers will be adjacent in the sorted array.

Algorithm

Given the intuition, the algorithm follows fairly simply. First, we sort the array, and then we compare each element to the previous element. Because there is exactly one duplicated element in the array, we know that the array is of at least length 2, and we can return the duplicate element as soon as we find it.

Complexity Analysis

  • Time complexity :O(nlgn)O(nlgn)

    Thesortinvocation costsO(nlgn)O(nlgn)time in Python and Java, so it dominates the subsequent linear scan.

  • Space complexity :O(1)O(1)(orO(n)O(n))

    Here, we sortnumsin place, so the memory footprint is constant. If we cannot modify the input array, then we must allocate linear space for a copy ofnumsand sort that instead.

class Solution {
    public int findDuplicate(int[] nums) {
        Arrays.sort(nums);
        for (int i = 1; i < nums.length; i++) {
            if (nums[i] == nums[i-1]) {
                return nums[i];
            }
        }

        return -1;
    }
}

Approach #2 Set [Accepted]

Intuition

If we store each element as we iterate over the array, we can simply check each element as we iterate over the array.

Algorithm

In order to achieve linear time complexity, we need to be able to insert elements into a data structure (and look them up) in constant time. ASetsatisfies these constraints nicely, so we iterate over the array and insert each element intoseen. Before inserting it, we check whether it is already there. If it is, then we found our duplicate, so we return it.

Complexity Analysis

  • Time complexity :O(n)O(n)

    Setin both Python and Java rely on underlying hash tables, so insertion and lookup have amortized constant time complexities. The algorithm is therefore linear, as it consists of aforloop that performs constant worknntimes.

  • Space complexity :O(n)O(n)

    In the worst case, the duplicate element appears twice, with one of its appearances at array indexn−1n−1. In this case,seenwill containn−1n−1distinct values, and will therefore occupyO(n)O(n)space.

class Solution {
    public int findDuplicate(int[] nums) {
        Set<Integer> seen = new HashSet<Integer>();
        for (int num : nums) {
            if (seen.contains(num)) {
                return num;
            }
            seen.add(num);
        }

        return -1;
    }
}

Approach #3 Floyd's Tortoise and Hare (Cycle Detection) [Accepted]

Intuition

If we interpretnumssuch that for each pair of index ii and value vi v​i​​, the "next" valuevjv​j​​is at indexviv​i​​, we can reduce this problem to cycle detection. See the solution to Linked List Cycle II for more details.

Algorithm

First off, we can easily show that the constraints of the problem imply that a cycle_must_exist. Because each number innumsis between11andnn, it will necessarily point to an index that exists. Therefore, the list can be traversed infinitely, which implies that there is a cycle. Additionally, because 00 cannot appear as a value innums,nums[0]cannot be part of the cycle. Therefore, traversing the array in this manner fromnums[0]is equivalent to traversing a cyclic linked list. Given this, the problem can be solved just like Linked List Cycle II.

To see the algorithm in action, check out the animation below:

1 / 25

Complexity Analysis

class Solution {
    public int findDuplicate(int[] nums) {
        // Find the intersection point of the two runners.
        int tortoise = nums[0];
        int hare = nums[0];
        do {
            tortoise = nums[tortoise];
            hare = nums[nums[hare]];
        } while (tortoise != hare);

        // Find the "entrance" to the cycle.
        int ptr1 = nums[0];
        int ptr2 = tortoise;
        while (ptr1 != ptr2) {
            ptr1 = nums[ptr1];
            ptr2 = nums[ptr2];
        }

        return ptr1;
    }
}

Last updated