I am studying Data Structures and Algorithms in C#/Java. After encountering a solution to the problem of Linked List duplicate removal, I have been struggling to understand it.
The solution is the one proposed by the renowned book Cracking the coding Interview (5th edition, page 208).
void RemoveDuplicates_HashSet(Node n) { HashSet<object> set = new HashSet<object>(); Node previous = null; while (n != null) { if (set.Contains(n.Data)) // Condition 1 previous.Next = n.Next; else // Condition 2 { set.Add(n.Data); previous = n; } n = n.Next; } }
Running the code with the following linked list A->B->A->B
:
// Creating test Singly LinkedList Node n = new Node("A"); n.Next = new Node("B"); n.Next.Next = new Node("A"); n.Next.Next.Next = new Node("B"); RemoveDuplicates_HashSet(n);
Works perfectly fine: the value of n
after the method is A->B
.
By following the code with a debugger, I can see that what happens in the method loop is the following:
| Pass | HashSet | n | previous | Comment | | ---- | ------- | ---------- | ---------- | ------------------------ | | – | – | A->B->A->B | null | | | 1 | A | B->A->B | A->B->A->B | Condition 2 is triggered | | 2 | A,B | A->B | B->A->B | Condition 2 is triggered | | 3 | A,B | B | B->B | Condition 1 is triggered | | 4 | A,B | null | B | Condition 1 is triggered |
I fail to understand how this actually results in several ways:
Where/how exactly are duplicates dropped from
n
? I understand that HashSet contains only unique elements, and it will therefore detect if an element was encountered already, however I still cannot see how the algorithm works in its entirety.How is it that the values pointed to by
n
are updated to beA->B
? Where is it that, given that essentially the loop is simply iterating over the Linked List doingn = n.Next
,n
is actually updated with the final valueA->B
? I understand that the list is passed by reference, but I cannot see how it is actually modified.
Advertisement
Answer
@Slaw’s comment pointed me in what I believe to be the right direction.
Key to question 2) – how the change is actually propagated from the method to the original list – was to understand that C#/Java objects are passed by reference, but the reference itself is passed by value: while the object can be modified by the method, the reference cannot be modified.
Once the method ends (termination of the method context), then
pointer will point again to the start of the Linked List. Continuously advancing then
pointer within the method (by doingn = n.Next
) does not affect the “external pointer” (the originaln
reference outside of the method).As per question 1) – the inner mechanism of the algorithm – the key is to understand that the modifying
previous.Next
does indeed modify the list pointed to byn
. Quoting @Slaw’s comment:
This code:
if (set.Contains(n.Data)) previous.Next = n.Next
checks if the element has already been encountered and, if it has, removes n from the linked list. It removes the node by assigning n.Next to previous.Next (which means previous.Next no longer points to n).
I’ve therefore tried to exhaustively diagram what happens in the algorithm.