*This post describes the implementation of a List in Java which allows log time removals and insertions.*

Lists are probably the most useful data structures in programming. Java provides a number of ways of creating a list out of the box with regular arrays, and the `ArrayList`

, `LinkedList`

and `Vector`

classes in the java.util package. However, none of these structures allow elements to be removed or added to the middle of the list in sub-linear time*, which for some applications can be useful. *(* - the LinkedList allows elements to be removed (inserted) via a ListIterator in constant time, but requires that the element (position) is located first, which takes linear time).*

Balanced search trees and skip lists allow this behaviour, but these work on the principle that the data is stored in some fixed order - there is no way to preserve the order in which the elements were added. They also do not usually provide the ability to access elements by their index in the data structure, which is a key feature of a list.

In this post I'll go through the implementation of an "unsorted" balanced search tree (`UnsortedList`

), which allows the elements to be located, added and removed by index in logarithmic time, but function exactly like an `ArrayList`

. The trick is that the list compares the elements in it on their index in the list itself. The `UnsortedList`

extends the `SortedList`

class I wrote for a previous post, which provides a relatively efficient implementation of an AVL tree that allows elements to quickly found by their index. I had to make a few tweaks to that code so that the required inner class and methods are exposed to overriding classes; the latest version, together with all the source code described in this blog is available here: data structures source.

The signature of the `UnsortedList`

class and it's constructor are a bit odd and look like this:

Firstly, it's a bit weird that an `UnsortedList`

is a `SortedList`

, but nevermind (perhaps I should have come up with a better name)! The `SortedList`

requires that a `Comparator`

be provided, which it uses to determine where values in the list should be located. However, since the actual values of the elements stored at each index is not important in an `UnsortedList`

, we simply provide a dummy implementation which returns 0, no matter what's put in; it doesn't matter as it won't be used anyway!

So how are comparisons made? The `SortedList`

class doesn't call the given `Comparator`

directly, but calls the `compare`

method on it's inner `Node`

class (which implements `Comparable`

); the implementation of the Node's `compare`

method then uses this `Comparator`

. This mechanism is used to enable multiple equal elements to be stored, for more details see the previous blog post). So, to get the `UnsortedList`

to work, it's this inner node's `compare`

method that needs to be replaced; which is done by overriding the class itself. The `UnsortedList`

's inner node class (`UnsortedNode`

) which extends the `SortedList`

's `Node`

class is where most of the logic for the `UnsortedList`

lies. The start of the class looks like this:

Note that each `UnsortedNode`

stores its index in the list itself; which is what is used when comparing them. However, the problem is that each time an element is added or removed from the list, this value might become out of date. These index values can't be kept up to date all the time because that would require linear time when inserting or removing (which is what happens in a regular list). To tell whether the value is out of date or not, the list also stores the `modcount`

at the time the index variable is set, this variable is incremented each time the list is altered (see `AbstractList`

, which is the subclass of `SortedList`

).

The code to get the correct index for a given `UnsortedNode`

is the only tricky bit. This works by first figuring out the correct index for the parent node (if there is one) and uses the fact that each node knows the size of the sub-tree rooted at that point (i.e. the number of elements in the whole list is the size of the sub-tree rooted at the root node). The code for the method looks like this:

The code is complicated by the fact that it uses iteration over recursion for efficiency. It works by simply returning the cached value if it's fresh and otherwise calculating it from it's closest "fresh" ancestor. The important lines in this method, where the actual calculation of the index takes place, are 15, 19 and 22. Each deals with one of the three possible cases: when the node is the root, when it's the left child of its parent and when it's the right child of its parent respectively.

The runtime of the above method is actually constant when it's being used in the program. This is because the `Node`

class's `sizeOfSubTree`

method takes constant time and when it's called within the program, the parent node is always either null or fresh. This is due to the fact that it's only called in the `UnsortedNode`

's `compare`

method which in turn is only called when the tree is searching for an element. These searches (like all binary tree searches) work from the root down; so when the `compare`

method is called on a non-root node, it must be that it was just called on its parent node.

With the ability to get the index of each `UnsortedNode`

in place, it's fairly straight forward to implement the required `compare`

method. The code for this method look like this:

This method essentially just returns the difference in the indices of the nodes; the only slight complication is that nodes that are already in the tree need to be considered as smaller than those already in the list with the same index. This is so that if you insert an element at some given index in the middle of the list, the newly added element gets the required index and the element that was at that index previously is moved up.

That's pretty much it, the only other methods that are required to be overridden in the `UnsortedList`

class are straight forward. These are the `add`

method (so that the `UnsortedNode`

class is used instead of the base `Node`

class), and the `contains`

and `remove(Object)`

methods (since these use binary search, which doesn't work in an `UnsortedList`

!).

**Performance**I've tested the `UnsortedList`

empirically and the results are as you'd expect; the performance is vastly better than the `ArrayList`

when elements need to be added and/or removed from arbitrary positions in the list; however, it's not as quick or as memory efficient in carrying out other operations. The asymptotic running times for various operations across the different list implementations are given below:

Adding To By Index | Removing From By Index | ||||||
---|---|---|---|---|---|---|---|

Front | Middle | End | Front | Middle | End | Get by Index | |

LinkedList | O(1) | O(n) | O(1) | O(1) | O(n) | O(1) | O(n) |

ArrayList/Vector | O(n) | O(n) | O(1)* | O(n) | O(n) | O(1)* | O(1) |

UnsortedList | O(log(n)) | O(log(n)) | O(log(n)) | O(log(n)) | O(log(n)) | O(log(n)) | O(log(n)) |

(* amortized constant time - running the method n times takes O(n) time). |

I've written a number of unit tests for both the `UnsortedList`

and `SortedList`

classes, the source for which is included in: mr-scottlogic-utils.zip. If you spot a bug not covered in the unit tests let me know!

In the next post I'll show how to adapt this data structure to improve the runtime of all the method in the above table.