This post goes through an implementation a SortedList in Java which ensures its elements are kept in order and guarantees O(log(n)) run time for all the basic operations: get, contains, remove and add. The source code, javadoc, unit tests and class files for this post are available here: scottlogic-utils-mr-1.4.zip.
Sorting is one of the most common operations applied to lists and as such Java has built in mechanisms for doing it, like the
Comparator interfaces and the
Collections.sort methods. These are great when you have a static list that needs to be ordered, but sometimes you want the list to remain sorted after some altering operations have been applied to it (e.g. if you add an element to an ArrayList which you've sorted, chances are that it's no longer in the right order). For some reason, the
java.util package is lacking a proper
SortedList, and since they're quite handy, I thought I write my own.
AlternativesAs with all data structures, whether a
SortedList the right tool for the job depends on how the data is going to be used. The
java.util package contains a host of different data structures, all of which have their place, but unfortunately it (at least at the moment) is missing the
SortedList. A comparison between some of the the Java's built in data structures and a
SortedList is given below:
|add(Object elem)||remove(Object elem)||get(int index)||contains(Object elem)||multiple equal elements|
|* - amortized constant time (inserting n objects takes O(n) time).|
If you're not likely to change the data structure much, you might want to just use an ArrayList or a regular array and sort it when you need, which can be done relatively quickly (O(n*log(n))), using Java's
Arrays.sort methods respectively. So long as you don't need to sort it too often, there's no problem.
If you're not sure how many times you'll need to sort it, it's quicker to ensure that the data is always kept in order. If you don't need to store multiple equal elements and don't need random access to the data (i.e. you don't need to run
get(int index)) then it's probably best to use Java's
TreeSet. If you need to store multiple equal elements but don't need random access to the data, or quick contains/remove methods then Java's
PriorityQueue might be the way to go. If these cases don't apply, then a
SortedList might be what you want.
Implementing a custom
List in JavaAll Lists in Java should implement the
java.util.List interface, however, rather than implement this directly, the easiest way to start off writing your own
List class is have it extend the
AbstractList class instead. This class implements the
List interface and provides default implementations of the methods, reducing the amount of code you need to write. Most of these default implementations just throws an
UnsupportedOperationException, but some are useful. For example, the default
iterator methods provide you with working iterators once you've provided an implementation for the
get(int index) method.
It's also easy to configure the iterators provided by the
AbstractList to exhibit fail-fast behaviour. This means that if the list is modified after the iterator has been created, other than through the iterator itself, then calling any method other than
hasNext() on the iterator will (hopefully!) cause the cause a
ConcurrentModificationException to be thrown, as is standard for all of Java's non-synchronized Collection classes. To do this, you just need to increment the
modCount variable whenever you add or remove an element from the list. For example, the
add method in the given implementation has the following structure:
Here I took the decision not to allow
null values to be added in the list, just to simplify the other methods and since on the vast majority of applications this is what you want. If you really need to add a
null value you can always just add a special singleton instance of type
T which you know represents
null instead of
null itself. The effect of the calling
modCount++; in the add method is that the following code will now throw a
Another thing to consider when writing a custom
List class (as with any class!) is how you are going to define its type and the constructors. Since a
SortedList needs a way of comparing the elements it stores, I've decided to leave the type definition simple but only supply a constructor which takes a
Comparator, this constructor has the following signature:
If you're not used to Java generics then this line might look a bit odd! The type definition, "
<? super T>>" just says that the given comparator must be capable of comparing elements of type
T, which is the type of element that the list is able to store.
This might seem like a weird decision, since Java's built in
TreeSet doesn't enforce the same restriction; it also allows a no-argument constructor. The reason is that this no-argument constructor is used to create a
TreeSet which orders elements by their natural order; i.e. the order you get if you use the
compareTo method on the set's elements. The draw back of this design decision is that there is no way to say that either the elements must be
Comparable, or you need to supply a
Comparator, so you need to add a runtime check for this (i.e. note the
ClassCastException thrown by the
If you need this behaviour with this
SortedList, you can you implement a very simple subclass of it which passes in a
Comparator providing this natural ordering. An example implementation of this is shown below:
Note that the type definition restricts the class so that only comparable objects can be stored in it; removing the need for any runtime check.
AVL Trees as Sorted ListsIn order to obtain the logarithmic run times for the standard operations, the SortedList needs to be based on some kind of balanced tree structure. I've used an AVL tree, which is pretty much the simplest form of balanced tree you can have (so less chance of mistakes!) and since it ensures the tree remains very balanced (more so than say a Red-Black Tree), it means that that the
contains methods always run efficiently.
An AVL tree is a binary search tree, which rebalances itself whenever the height of any node's subtree becomes at least two larger than it's other subtree. The rebalancing requires implementing just two methods - the left and the right tree rotations. I won't go into all the details here, but if you're interested, a pretty easy to follow introduction to AVL trees is available at: lecture1, lecture2 and lecture3.
When it comes to actually implementing an AVL tree, the most obvious way to do it in Java is to have an inner
Node class to represent the individual positions in the tree, and then have the main class hold a reference to the root
Node of the tree. The
Node class needs to be defined somewhat recursively, as each Node needs to maintain references to both the children nodes and their parent node. In order to use an AVL tree as a
SortedList, you need this Node class to be slightly different than in a standard implementation, the changes required are that:
- Allow more than one node to store values that are equal in terms of the given comparator.
- Nodes need to remember the total number of children they have
The first alteration is required so that the tree can be used as a
List rather than as a
Set and the second in order to implement the
get(int index) method efficiently.
The start of the Node class in the implementation looks like this:
Node class is an inner class there is no need to "parameterise" the type definition, it automatically inherits the definition of the
T from the
SortedList parent class. The list allows multiple values to be stored by giving each node a unique id, which is incremented as each element is added. The
compareTo method then uses this when comparing values, so that nodes with the same value according to the comparator, are distinguished by their unique id. The
numChildren fields are really just cached values, since their values could be obtained by examining the child nodes. It's up to the implementation to ensure that these values are maintained as changes are made to the tree. In the given implementation, this is all done in the
updateCachedValues method of the
So long as this method is called on the appropriate node each time the tree is structurally altered, the values will remain correct. It's not always obvious which node constitutes the "appropriate one", but it should always be the node which was altered with the lowest height in the resulting tree (it's always the case that there is one such node).
The only key method that is missing from a standard AVL tree implementation that is required to make it work as a
List is the
get(int index) method. As I mentioned before, this method is going to make use of the
numChildren field of the
Node class to so that it can be implemented efficiently. Once this field is in place, it's not difficult to implement - the method just needs to traverse the tree, making sure that it remembers how many smaller values there are than those at the current node; this effectively tells you the index of the first value stored at the current node. In the provided implementation, the code look like this:
sizeOfSubTree method just returns one plus the number of children values of the node. The
totalSmallerElements variable effectively stores the index of the current node is maintained in lines as the tree is traversed.
Doing without recursionYou might have noticed that the code so far has been iterative rather than recursive. Generally, most operations involving trees are written using recursion, but since iterative solutions tend to be quicker, I've stuck to using iteration throughout the implementation (the only exception is with the
structurallyEqualTo method which is just there for testing). For methods where you just need to traverse the tree, like the
contains methods, turning it from a recursive method to a iterative one is just a case of adding a while loop and keeping a reference to the current Node that you're looking at. For example, you go from something like:
to something like:
The only difficulty comes when the method needs to go back to nodes that have previously been visited, (i.e. those that can't be written with just simple tail recursion). For instance, if you want to print all the elements in the tree in order; with recursion this is just a few lines:
This could then be invoked on the root node to print the whole tree. It's really not obvious how to do this without using recursion! To overcome this, the
Node class in the implementation defines a couple of handy iterative methods - the
smallestNodeInSubTree, which finds the smallest node in the sub-tree rooted at the node and
successor, which find the next largest node in the tree (so returns
null for the node storing the largest values in the tree). They are defined like this:
With these in place, you could write an iterative version of the printAll method like this:
Unit TestsI always find that when writing code like this, it's really easy to make a mistake, so to build up some confidence in it I wrote some junit tests for the SortedList class which are included in the download. If you find a problem with it that's not covered by them please let me know.