This post describes the implementation of a List in Java which allows log time removals and insertions.
Lists are probably the most useful data structures in programming. Java provides a number of ways of creating a list out of the box with regular arrays, and the
Vector classes in the java.util package. However, none of these structures allow elements to be removed or added to the middle of the list in sub-linear time*, which for some applications can be useful. (* - the LinkedList allows elements to be removed (inserted) via a
ListIterator in constant time, but requires that the element (position) is located first, which takes linear time).
Balanced search trees and skip lists allow this behaviour, but these work on the principle that the data is stored in some fixed order - there is no way to preserve the order in which the elements were added. They also do not usually provide the ability to access elements by their index in the data structure, which is a key feature of a list.
In this post I'll go through the implementation of an "unsorted" balanced search tree (
UnsortedList), which allows the elements to be located, added and removed by index in logarithmic time, but function exactly like an
ArrayList. The trick is that the list compares the elements in it on their index in the list itself. The
UnsortedList extends the
SortedList class I wrote for a previous post, which provides a relatively efficient implementation of an AVL tree that allows elements to quickly found by their index. I had to make a few tweaks to that code so that the required inner class and methods are exposed to overriding classes; the latest version, together with all the source code described in this blog is available here: data structures source.
The signature of the
UnsortedList class and it's constructor are a bit odd and look like this:
Firstly, it's a bit weird that an
UnsortedList is a
SortedList, but nevermind (perhaps I should have come up with a better name)! The
SortedList requires that a
Comparator be provided, which it uses to determine where values in the list should be located. However, since the actual values of the elements stored at each index is not important in an
UnsortedList, we simply provide a dummy implementation which returns 0, no matter what's put in; it doesn't matter as it won't be used anyway!
So how are comparisons made? The
SortedList class doesn't call the given
Comparator directly, but calls the
compare method on it's inner
Node class (which implements
Comparable); the implementation of the Node's
compare method then uses this
Comparator. This mechanism is used to enable multiple equal elements to be stored, for more details see the previous blog post). So, to get the
UnsortedList to work, it's this inner node's
compare method that needs to be replaced; which is done by overriding the class itself. The
UnsortedList's inner node class (
UnsortedNode) which extends the
Node class is where most of the logic for the
UnsortedList lies. The start of the class looks like this:
Note that each
UnsortedNode stores its index in the list itself; which is what is used when comparing them. However, the problem is that each time an element is added or removed from the list, this value might become out of date. These index values can't be kept up to date all the time because that would require linear time when inserting or removing (which is what happens in a regular list). To tell whether the value is out of date or not, the list also stores the
modcount at the time the index variable is set, this variable is incremented each time the list is altered (see
AbstractList, which is the subclass of
The code to get the correct index for a given
UnsortedNode is the only tricky bit. This works by first figuring out the correct index for the parent node (if there is one) and uses the fact that each node knows the size of the sub-tree rooted at that point (i.e. the number of elements in the whole list is the size of the sub-tree rooted at the root node). The code for the method looks like this:
The code is complicated by the fact that it uses iteration over recursion for efficiency. It works by simply returning the cached value if it's fresh and otherwise calculating it from it's closest "fresh" ancestor. The important lines in this method, where the actual calculation of the index takes place, are 15, 19 and 22. Each deals with one of the three possible cases: when the node is the root, when it's the left child of its parent and when it's the right child of its parent respectively.
The runtime of the above method is actually constant when it's being used in the program. This is because the
sizeOfSubTree method takes constant time and when it's called within the program, the parent node is always either null or fresh. This is due to the fact that it's only called in the
compare method which in turn is only called when the tree is searching for an element. These searches (like all binary tree searches) work from the root down; so when the
compare method is called on a non-root node, it must be that it was just called on its parent node.
With the ability to get the index of each
UnsortedNode in place, it's fairly straight forward to implement the required
compare method. The code for this method look like this:
This method essentially just returns the difference in the indices of the nodes; the only slight complication is that nodes that are already in the tree need to be considered as smaller than those already in the list with the same index. This is so that if you insert an element at some given index in the middle of the list, the newly added element gets the required index and the element that was at that index previously is moved up.
That's pretty much it, the only other methods that are required to be overridden in the
UnsortedList class are straight forward. These are the
add method (so that the
UnsortedNode class is used instead of the base
Node class), and the
remove(Object) methods (since these use binary search, which doesn't work in an
PerformanceI've tested the
UnsortedList empirically and the results are as you'd expect; the performance is vastly better than the
ArrayList when elements need to be added and/or removed from arbitrary positions in the list; however, it's not as quick or as memory efficient in carrying out other operations. The asymptotic running times for various operations across the different list implementations are given below:
|Adding To By Index||Removing From By Index|
|Front||Middle||End||Front||Middle||End||Get by Index|
|(* amortized constant time - running the method n times takes O(n) time).|
I've written a number of unit tests for both the
SortedList classes, the source for which is included in: mr-scottlogic-utils.zip. If you spot a bug not covered in the unit tests let me know!
In the next post I'll show how to adapt this data structure to improve the runtime of all the method in the above table.