Saturday, January 11, 2014

Detect and identify cycle in linked list

Tortoise-Hare Algorithm illustrated -

Detection:
The classic Tortoise-Hare algorithm (Floyd Cycle Detection) requires two pointers, one fast (Hare or H) and a slower (Tortoise or T). Start both of them from the same (head) node and run the Hare twice as fast as the Tortoise. Clearly, they will meet at some point if there is a cycle. So far so good, very clear and intuitive.

Identification:
Let's assume that the singly linked list is represented as x = {x(0), x(1), ..., x(z)}. It has a cycle of length n starting at node x(m). T and H meet at node x(k), which is i distance away from start of the cycle. [For simplicity, I am eliminating obvious assumptions like, i <=n, k>=m, etc).



As H is twice as fast as T, by the time T crosses k nodes, H will traverse 2k nodes. Hence, we can write the following equations,

For T: k = m + i
For H: 2k = m + n + i
=> 2(m + i) = m + n + i
=> m =n - i

Based on this observation, T is restarted from the head or x(0), and both T and H are moved at the same speed of 1 node per time.
You will see that they ALWAYS meet at start of the cycle. This is because, by the time T crosses m nodes to reach x(m), H will now also cross m nodes. But H is moving in the cycle. So, it will cross, remaining (n-i) nodes of cycle (as it was at node x(k), where k = m + i) and reach the beginning of the cycle, x(m). But hey! (n-i) is actually m! So, H has just crossed m nodes, and will stop at x(m), where T has just reached :)
Thus, we know where the cycle starts.
The reason I wrote this blog is because the usual descriptions of this algorithm just tells you to restart T from head and move them at the same speed without this little description (which can be intuitive for many). But I felt the need of explaining the reason.

The rest is simple. Keep either T or H fixed at x(m), move the other till it reaches the fixed pointer again. The number of nodes traversed is length of the cycle.

No comments:

Post a Comment