Linked List – remove duplicates algorithm in C#/Java

Tags: , , , ,



I am studying Data Structures and Algorithms in C#/Java. After encountering a solution to the problem of Linked List duplicate removal, I have been struggling to understand it.

The solution is the one proposed by the renowned book Cracking the coding Interview (5th edition, page 208).

void RemoveDuplicates_HashSet(Node n)
{
    HashSet<object> set = new HashSet<object>();

    Node previous = null;
    while (n != null)
    {
        if (set.Contains(n.Data))       // Condition 1
            previous.Next = n.Next;
        else                            // Condition 2
        {
            set.Add(n.Data);
            previous = n;
        }
            
        n = n.Next;
    }
}

Running the code with the following linked list A->B->A->B:

// Creating test Singly LinkedList
Node n = new Node("A");
n.Next = new Node("B");
n.Next.Next = new Node("A");
n.Next.Next.Next = new Node("B");

RemoveDuplicates_HashSet(n);

Works perfectly fine: the value of n after the method is A->B.

By following the code with a debugger, I can see that what happens in the method loop is the following:

| Pass | HashSet | n          | previous   | Comment                  |
| ---- | ------- | ---------- | ---------- | ------------------------ |
| –    | –       | A->B->A->B | null       |                          |
| 1    | A       | B->A->B    | A->B->A->B | Condition 2 is triggered |
| 2    | A,B     | A->B       | B->A->B    | Condition 2 is triggered |
| 3    | A,B     | B          | B->B       | Condition 1 is triggered |
| 4    | A,B     | null       | B          | Condition 1 is triggered |

I fail to understand how this actually results in several ways:

  1. Where/how exactly are duplicates dropped from n? I understand that HashSet contains only unique elements, and it will therefore detect if an element was encountered already, however I still cannot see how the algorithm works in its entirety.

  2. How is it that the values pointed to by n are updated to be A->B? Where is it that, given that essentially the loop is simply iterating over the Linked List doing n = n.Next, n is actually updated with the final value A->B? I understand that the list is passed by reference, but I cannot see how it is actually modified.

Answer

@Slaw’s comment pointed me in what I believe to be the right direction.

This code: if (set.Contains(n.Data)) previous.Next = n.Next checks if the element has already been encountered and, if it has, removes n from the linked list. It removes the node by assigning n.Next to previous.Next (which means previous.Next no longer points to n).

I’ve therefore tried to exhaustively diagram what happens in the algorithm.

Diagram of the algorithm.



Source: stackoverflow