This post goes through an implementation a SortedList in Java which ensures its elements are kept in order and guarantees O(log(n)) run time for all the basic operations: get, contains, remove and add. The source code, javadoc, unit tests and class files for this post are available here: scottlogic-utils-mr-1.4.zip.
Sorting is one of the most common operations applied to lists and as such Java has built in mechanisms for doing it, like the Comparable
and Comparator
interfaces and the Collections.sort
methods. These are great when you have a static list that needs to be ordered, but sometimes you want the list to remain sorted after some altering operations have been applied to it (e.g. if you add an element to an ArrayList which you've sorted, chances are that it's no longer in the right order). For some reason, the java.util
package is lacking a proper SortedList
, and since they're quite handy, I thought I write my own.
AlternativesAs with all data structures, whether a SortedList
the right tool for the job depends on how the data is going to be used. The java.util
package contains a host of different data structures, all of which have their place, but unfortunately it (at least at the moment) is missing the SortedList
. A comparison between some of the the Java's built in data structures and a SortedList
is given below:
add(Object elem) | remove(Object elem) | get(int index) | contains(Object elem) | multiple equal elements | |
---|---|---|---|---|---|
ArrayList | O(1)* | O(n) | O(1) | O(n) | YES |
LinkedList | O(1) | O(n) | O(n) | O(n) | YES |
TreeSet | O(log(n)) | O(log(n)) | N/A | O(log(n)) | NO |
PriorityQueue | O(log(n)) | O(n) | N/A | O(n) | YES |
SortedList | O(log(n)) | O(log(n)) | O(log(n)) | O(log(n)) | YES |
* - amortized constant time (inserting n objects takes O(n) time). |
If you're not likely to change the data structure much, you might want to just use an ArrayList or a regular array and sort it when you need, which can be done relatively quickly (O(n*log(n))), using Java's Collections.sort
or Arrays.sort
methods respectively. So long as you don't need to sort it too often, there's no problem.
If you're not sure how many times you'll need to sort it, it's quicker to ensure that the data is always kept in order. If you don't need to store multiple equal elements and don't need random access to the data (i.e. you don't need to run get(int index)
) then it's probably best to use Java's TreeSet
. If you need to store multiple equal elements but don't need random access to the data, or quick contains/remove methods then Java's PriorityQueue
might be the way to go. If these cases don't apply, then a SortedList
might be what you want.
Implementing a custom List
in JavaAll Lists in Java should implement the java.util.List
interface, however, rather than implement this directly, the easiest way to start off writing your own List
class is have it extend the AbstractList
class instead. This class implements the List
interface and provides default implementations of the methods, reducing the amount of code you need to write. Most of these default implementations just throws an UnsupportedOperationException
, but some are useful. For example, the default listIterator
and iterator
methods provide you with working iterators once you've provided an implementation for the get(int index)
method.
It's also easy to configure the iterators provided by the AbstractList
to exhibit fail-fast behaviour. This means that if the list is modified after the iterator has been created, other than through the iterator itself, then calling any method other than hasNext()
on the iterator will (hopefully!) cause the cause a ConcurrentModificationException
to be thrown, as is standard for all of Java's non-synchronized Collection classes. To do this, you just need to increment the modCount
variable whenever you add or remove an element from the list. For example, the add
method in the given implementation has the following structure:
Here I took the decision not to allow null
values to be added in the list, just to simplify the other methods and since on the vast majority of applications this is what you want. If you really need to add a null
value you can always just add a special singleton instance of type T
which you know represents null
instead of null
itself. The effect of the calling modCount++;
in the add method is that the following code will now throw a ConcurrentModificationException
.
Another thing to consider when writing a custom List
class (as with any class!) is how you are going to define its type and the constructors. Since a SortedList
needs a way of comparing the elements it stores, I've decided to leave the type definition simple but only supply a constructor which takes a Comparator
, this constructor has the following signature:
If you're not used to Java generics then this line might look a bit odd! The type definition, "<? super T>>
" just says that the given comparator must be capable of comparing elements of type T
, which is the type of element that the list is able to store.
This might seem like a weird decision, since Java's built in TreeSet
doesn't enforce the same restriction; it also allows a no-argument constructor. The reason is that this no-argument constructor is used to create a TreeSet
which orders elements by their natural order; i.e. the order you get if you use the compareTo
method on the set's elements. The draw back of this design decision is that there is no way to say that either the elements must be Comparable
, or you need to supply a Comparator
, so you need to add a runtime check for this (i.e. note the ClassCastException
thrown by the TreeSet
's add
method).
If you need this behaviour with this SortedList
, you can you implement a very simple subclass of it which passes in a Comparator
providing this natural ordering. An example implementation of this is shown below:
Note that the type definition restricts the class so that only comparable objects can be stored in it; removing the need for any runtime check.
AVL Trees as Sorted ListsIn order to obtain the logarithmic run times for the standard operations, the SortedList needs to be based on some kind of balanced tree structure. I've used an AVL tree, which is pretty much the simplest form of balanced tree you can have (so less chance of mistakes!) and since it ensures the tree remains very balanced (more so than say a Red-Black Tree), it means that that the get
and contains
methods always run efficiently.
An AVL tree is a binary search tree, which rebalances itself whenever the height of any node's subtree becomes at least two larger than it's other subtree. The rebalancing requires implementing just two methods - the left and the right tree rotations. I won't go into all the details here, but if you're interested, a pretty easy to follow introduction to AVL trees is available at: lecture1, lecture2 and lecture3.
When it comes to actually implementing an AVL tree, the most obvious way to do it in Java is to have an inner Node
class to represent the individual positions in the tree, and then have the main class hold a reference to the root Node
of the tree. The Node
class needs to be defined somewhat recursively, as each Node needs to maintain references to both the children nodes and their parent node. In order to use an AVL tree as a SortedList
, you need this Node class to be slightly different than in a standard implementation, the changes required are that:
- Allow more than one node to store values that are equal in terms of the given comparator.
- Nodes need to remember the total number of children they have
The first alteration is required so that the tree can be used as a List
rather than as a Set
and the second in order to implement the get(int index)
method efficiently.
The start of the Node class in the implementation looks like this:
As the Node
class is an inner class there is no need to "parameterise" the type definition, it automatically inherits the definition of the T
from the SortedList
parent class. The list allows multiple values to be stored by giving each node a unique id, which is incremented as each element is added. The Node
's compareTo
method then uses this when comparing values, so that nodes with the same value according to the comparator, are distinguished by their unique id. The height
and numChildren
fields are really just cached values, since their values could be obtained by examining the child nodes. It's up to the implementation to ensure that these values are maintained as changes are made to the tree. In the given implementation, this is all done in the updateCachedValues
method of the Node
class:
So long as this method is called on the appropriate node each time the tree is structurally altered, the values will remain correct. It's not always obvious which node constitutes the "appropriate one", but it should always be the node which was altered with the lowest height in the resulting tree (it's always the case that there is one such node).
The only key method that is missing from a standard AVL tree implementation that is required to make it work as a List
is the get(int index)
method. As I mentioned before, this method is going to make use of the numChildren
field of the Node
class to so that it can be implemented efficiently. Once this field is in place, it's not difficult to implement - the method just needs to traverse the tree, making sure that it remembers how many smaller values there are than those at the current node; this effectively tells you the index of the first value stored at the current node. In the provided implementation, the code look like this:
Here the sizeOfSubTree
method just returns one plus the number of children values of the node. The totalSmallerElements
variable effectively stores the index of the current node is maintained in lines as the tree is traversed.
Doing without recursionYou might have noticed that the code so far has been iterative rather than recursive. Generally, most operations involving trees are written using recursion, but since iterative solutions tend to be quicker, I've stuck to using iteration throughout the implementation (the only exception is with the structurallyEqualTo
method which is just there for testing). For methods where you just need to traverse the tree, like the get
or contains
methods, turning it from a recursive method to a iterative one is just a case of adding a while loop and keeping a reference to the current Node that you're looking at. For example, you go from something like:
to something like:
The only difficulty comes when the method needs to go back to nodes that have previously been visited, (i.e. those that can't be written with just simple tail recursion). For instance, if you want to print all the elements in the tree in order; with recursion this is just a few lines:
This could then be invoked on the root node to print the whole tree. It's really not obvious how to do this without using recursion! To overcome this, the Node
class in the implementation defines a couple of handy iterative methods - the smallestNodeInSubTree
, which finds the smallest node in the sub-tree rooted at the node and successor
, which find the next largest node in the tree (so returns null
for the node storing the largest values in the tree). They are defined like this:
With these in place, you could write an iterative version of the printAll method like this:
Unit TestsI always find that when writing code like this, it's really easy to make a mistake, so to build up some confidence in it I wrote some junit tests for the SortedList class which are included in the download. If you find a problem with it that's not covered by them please let me know.
Mark Rhodes