This post describes the implementation of a List in Java which allows log time removals and insertions.
Lists are probably the most useful data structures in programming. Java provides a number of ways of creating a list out of the box with regular arrays, and the ArrayList
, LinkedList
and Vector
classes in the java.util package. However, none of these structures allow elements to be removed or added to the middle of the list in sub-linear time*, which for some applications can be useful. (* - the LinkedList allows elements to be removed (inserted) via a ListIterator
in constant time, but requires that the element (position) is located first, which takes linear time).
Balanced search trees and skip lists allow this behaviour, but these work on the principle that the data is stored in some fixed order - there is no way to preserve the order in which the elements were added. They also do not usually provide the ability to access elements by their index in the data structure, which is a key feature of a list.
In this post I'll go through the implementation of an "unsorted" balanced search tree (UnsortedList
), which allows the elements to be located, added and removed by index in logarithmic time, but function exactly like an ArrayList
. The trick is that the list compares the elements in it on their index in the list itself. The UnsortedList
extends the SortedList
class I wrote for a previous post, which provides a relatively efficient implementation of an AVL tree that allows elements to quickly found by their index. I had to make a few tweaks to that code so that the required inner class and methods are exposed to overriding classes; the latest version, together with all the source code described in this blog is available here: data structures source.
The signature of the UnsortedList
class and it's constructor are a bit odd and look like this:
Firstly, it's a bit weird that an UnsortedList
is a SortedList
, but nevermind (perhaps I should have come up with a better name)! The SortedList
requires that a Comparator
be provided, which it uses to determine where values in the list should be located. However, since the actual values of the elements stored at each index is not important in an UnsortedList
, we simply provide a dummy implementation which returns 0, no matter what's put in; it doesn't matter as it won't be used anyway!
So how are comparisons made? The SortedList
class doesn't call the given Comparator
directly, but calls the compare
method on it's inner Node
class (which implements Comparable
); the implementation of the Node's compare
method then uses this Comparator
. This mechanism is used to enable multiple equal elements to be stored, for more details see the previous blog post). So, to get the UnsortedList
to work, it's this inner node's compare
method that needs to be replaced; which is done by overriding the class itself. The UnsortedList
's inner node class (UnsortedNode
) which extends the SortedList
's Node
class is where most of the logic for the UnsortedList
lies. The start of the class looks like this:
Note that each UnsortedNode
stores its index in the list itself; which is what is used when comparing them. However, the problem is that each time an element is added or removed from the list, this value might become out of date. These index values can't be kept up to date all the time because that would require linear time when inserting or removing (which is what happens in a regular list). To tell whether the value is out of date or not, the list also stores the modcount
at the time the index variable is set, this variable is incremented each time the list is altered (see AbstractList
, which is the subclass of SortedList
).
The code to get the correct index for a given UnsortedNode
is the only tricky bit. This works by first figuring out the correct index for the parent node (if there is one) and uses the fact that each node knows the size of the sub-tree rooted at that point (i.e. the number of elements in the whole list is the size of the sub-tree rooted at the root node). The code for the method looks like this:
The code is complicated by the fact that it uses iteration over recursion for efficiency. It works by simply returning the cached value if it's fresh and otherwise calculating it from it's closest "fresh" ancestor. The important lines in this method, where the actual calculation of the index takes place, are 15, 19 and 22. Each deals with one of the three possible cases: when the node is the root, when it's the left child of its parent and when it's the right child of its parent respectively.
The runtime of the above method is actually constant when it's being used in the program. This is because the Node
class's sizeOfSubTree
method takes constant time and when it's called within the program, the parent node is always either null or fresh. This is due to the fact that it's only called in the UnsortedNode
's compare
method which in turn is only called when the tree is searching for an element. These searches (like all binary tree searches) work from the root down; so when the compare
method is called on a non-root node, it must be that it was just called on its parent node.
With the ability to get the index of each UnsortedNode
in place, it's fairly straight forward to implement the required compare
method. The code for this method look like this:
This method essentially just returns the difference in the indices of the nodes; the only slight complication is that nodes that are already in the tree need to be considered as smaller than those already in the list with the same index. This is so that if you insert an element at some given index in the middle of the list, the newly added element gets the required index and the element that was at that index previously is moved up.
That's pretty much it, the only other methods that are required to be overridden in the UnsortedList
class are straight forward. These are the add
method (so that the UnsortedNode
class is used instead of the base Node
class), and the contains
and remove(Object)
methods (since these use binary search, which doesn't work in an UnsortedList
!).
PerformanceI've tested the UnsortedList
empirically and the results are as you'd expect; the performance is vastly better than the ArrayList
when elements need to be added and/or removed from arbitrary positions in the list; however, it's not as quick or as memory efficient in carrying out other operations. The asymptotic running times for various operations across the different list implementations are given below:
Adding To By Index | Removing From By Index | ||||||
---|---|---|---|---|---|---|---|
Front | Middle | End | Front | Middle | End | Get by Index | |
LinkedList | O(1) | O(n) | O(1) | O(1) | O(n) | O(1) | O(n) |
ArrayList/Vector | O(n) | O(n) | O(1)* | O(n) | O(n) | O(1)* | O(1) |
UnsortedList | O(log(n)) | O(log(n)) | O(log(n)) | O(log(n)) | O(log(n)) | O(log(n)) | O(log(n)) |
(* amortized constant time - running the method n times takes O(n) time). |
I've written a number of unit tests for both the UnsortedList
and SortedList
classes, the source for which is included in: mr-scottlogic-utils.zip. If you spot a bug not covered in the unit tests let me know!
In the next post I'll show how to adapt this data structure to improve the runtime of all the method in the above table.