Cyclic Linked Lists and Racing Pointers

Cyclic Linked Lists and Racing Pointers

It's common for programmers to be given interview questions about programming. Usually, these questions involve implementing quick, small algorithms - for example, you might be asked to write a function that reverses its (string) input.

Today, I heard an interesting question to do with linked lists. A linked list is a data structure consisting of a number of nodes, each of which has (in this case) a single link to another node. Mathematicians: think 'directed graph'.

Here's an example of a simple linked list:

head ----> | Node |
| +data| +------+
| +next| ----> | Node |
+------+ |------|
| +data|
| +next| ----> NULL

To traverse the list, we'd start at 'head', a pointer to the first node. We'd then grab successive nodes, by using the 'next' property of whatever node we're up to. Finally, we'd reach NULL, the end of our list.

The advantage of a linked list is that it's quick (O(1)) to add items to (and remove them from) one end of the list, which makes the linked list very helpful for implementing a stack. For those interested, retrieving a particular node from a linked list is a O(n) task, because the list must be traversed until the node is found.

Anyway, there's a class of linked list we can call cyclic. A perfectly cyclic list is one where the nth node's 'next' pointer points to the same place as 'head':

| |
v |
+------+ |
head ----> | Node | |
|------| |
| +data| +------+ |
| +next| ----> | Node | |
+------+ |------| |
| +data| |
| +next| -+

But, consider that a cyclic list need not be perfectly cyclic. Instead, the cycle could involve any subset of the list. Here's an example of a very small cycle:

v |
-> [ ] -> [ ] -> [ ] -+
And a larger one:
v |
-> [ ] -> [ ] -> [ ] -> [ ] -+

The question was: write a (good) algorithm that checks if a linked list is cyclic. The algorithm should be optimised for memory usage.

At first, that doesn't seem too hard. Here's a naïve solution:

Solution 1: For each node, if the node's next pointer is NULL, then the list is not cyclic.

This idea is perfectly correct, and we'll use it in all our solutions; if we ever come across a NULL pointer, our job is done.

But there's a major problem with this idea: we can't implement it as an algorithm to find cyclic lists. To see why, consider that, in a cyclic list, we don't know when we've reached the end of the list and started looping back round. Which is a major problem: if we can't tell when we've finished traversing the list, and we don't come across a NULL, this decision method won't finish.

If our linked list is perfectly cyclic then it's not a worry. We simply need to check if we've reached 'head' - if we have, we've checked all the nodes without coming across a NULL, and the list is not (perfectly) cyclic. But consider a non-perfect cyclic linked list, like the one I gave above. In this case, we can only tell if we've looped back upon ourselves by checking against every node we've visited. And that brings us to:

Solution 2: For each node, store the node's location. As we traverse the list looking for a NULL pointer, compare the current node with the list of previously visited locations. If the current node is there, we've finished traversing the list, and it is cyclic.

This is a working solution. The problem is that it is inefficient, both in terms of space (memory) and time (CPU cycles).

To see why it's slow, consider a list of a hundred items. As well as traversing the list (an O(n) operation), you'd be doing a number of comparisons (0 < x <= n) at each node. This works out to be something like n2  (= 10,000) operations.

More pertinently, we're storing a pointer to each node in memory. So, we're using n * [the size of a pointer] units of space. The question didn't specify CPU time as a constraint. So, let's try optimising the memory we're using.

Solution 3: For each node, iterate from the head node to the current node, checking if the node points to any of the nodes between the two.

This solution uses two pointers that keep track of where in the list we are up to. The first, the 'outside' pointer, just goes through the list. The second, the 'inside' pointer, checks that none of the nodes between the start of the list and the 'outside' node are pointed to by the 'outside' node.

So, in effect, this algorithm targets the right-most node in the diagrams above - the node with the long pointer. Here's another way to think about it: instead of storing the nodes visited, this method uses the fact that the list has already stored them. We only need remember our position.

While this algorithm is very efficient in memory, it's about as slow as solution 2. The number of comparisons done at worst case is something like n2 / 2 (triangular), which means it's still O(n2).

This is a correct answer to the question. It's as efficient in memory as you can get. But it's a little unsatisfying. There is another solution - I found it fairly remarkable. It uses the same amount of memory, and is O(n):

Solution 4: Start with two pointers to the first element of the list. Now, move the first pointer one node ahead of its current position. Move the second pointer two nodes ahead of its current position. If the first pointer equals the second, the list is cyclic. Repeat.

This algorithm is, to my mind, not very intuitive. I call it the racing pointers method. It works because:

  • The second pointer is going faster than the first, so
  • It will reach the cyclic portion of the list before the first pointer.
  • The algorithm will keep running, however, until the first pointer eventually enters the cyclic portion.
  • At which time, the second pointer will loop back, behind the first
  • And approach it. Then:
    • If the second pointer is one node behind the first pointer, they will meet at the next movement.
    • If the second pointer is two nodes behind the first pointer, they will meet in two movements.
    • And so on.

If the list is perfectly cyclic, then the faster pointer will catch the slower one exactly half way through the list. I think the worst performance you could wring out of the algorithm would be n. I think it's fairly clear that the fast pointer can be at most min(size of cycle, n/2) nodes behind the slow pointer.

In any case, I think it's a beautiful method, because it's so different from the others.