*This post goes through an implementation a SortedList in Java which ensures its elements are kept in order and guarantees O(log(n)) run time for all the basic operations: get, contains, remove and add.* The source code, javadoc, unit tests and class files for this post are available here: **scottlogic-utils-mr-1.4.zip**.

Sorting is one of the most common operations applied to lists and as such Java has built in mechanisms for doing it, like the `Comparable`

and `Comparator`

interfaces and the `Collections.sort`

methods. These are great when you have a static list that needs to be ordered, but sometimes you want the list to remain sorted after some altering operations have been applied to it (e.g. if you add an element to an ArrayList which you've sorted, chances are that it's no longer in the right order). For some reason, the `java.util`

package is lacking a proper `SortedList`

, and since they're quite handy, I thought I write my own.

**Alternatives**As with all data structures, whether a `SortedList`

the right tool for the job depends on how the data is going to be used. The `java.util`

package contains a host of different data structures, all of which have their place, but unfortunately it (at least at the moment) is missing the `SortedList`

. A comparison between some of the the Java's built in data structures and a `SortedList`

is given below:

add(Object elem) | remove(Object elem) | get(int index) | contains(Object elem) | multiple equal elements | |
---|---|---|---|---|---|

ArrayList | O(1)* |
O(n) |
O(1) |
O(n) |
YES |

LinkedList | O(1) |
O(n) |
O(n) |
O(n) |
YES |

TreeSet | O(log(n)) |
O(log(n)) |
N/A | O(log(n)) |
NO |

PriorityQueue | O(log(n)) |
O(n) |
N/A | O(n) |
YES |

SortedList | O(log(n)) |
O(log(n)) |
O(log(n)) |
O(log(n)) |
YES |

* - amortized constant time (inserting n objects takes O(n) time). |

If you're not likely to change the data structure much, you might want to just use an ArrayList or a regular array and sort it when you need, which can be done relatively quickly (*O(n*log(n))*), using Java's `Collections.sort`

or `Arrays.sort`

methods respectively. So long as you don't need to sort it too often, there's no problem.

If you're not sure how many times you'll need to sort it, it's quicker to ensure that the data is always kept in order. If you don't need to store multiple equal elements and don't need random access to the data (i.e. you don't need to run `get(int index)`

) then it's probably best to use Java's `TreeSet`

. If you need to store multiple equal elements but don't need random access to the data, or quick contains/remove methods then Java's `PriorityQueue`

might be the way to go. If these cases don't apply, then a `SortedList`

might be what you want.

**Implementing a custom List in Java**All Lists in Java should implement the

`java.util.List`

interface, however, rather than implement this directly, the easiest way to start off writing your own `List`

class is have it extend the `AbstractList`

class instead. This class implements the `List`

interface and provides default implementations of the methods, reducing the amount of code you need to write. Most of these default implementations just throws an `UnsupportedOperationException`

, but some are useful. For example, the default `listIterator`

and `iterator `

methods provide you with working iterators once you've provided an implementation for the `get(int index)`

method.It's also easy to configure the iterators provided by the `AbstractList`

to exhibit *fail-fast* behaviour. This means that if the list is modified after the iterator has been created, other than through the iterator itself, then calling any method other than `hasNext()`

on the iterator will (hopefully!) cause the cause a `ConcurrentModificationException`

to be thrown, as is standard for all of Java's non-synchronized Collection classes. To do this, you just need to increment the `modCount`

variable whenever you add or remove an element from the list. For example, the `add`

method in the given implementation has the following structure:

Here I took the decision not to allow `null`

values to be added in the list, just to simplify the other methods and since on the vast majority of applications this is what you want. If you really need to add a `null`

value you can always just add a special singleton instance of type `T`

which you know represents `null`

instead of `null`

itself. The effect of the calling `modCount++;`

in the add method is that the following code will now throw a `ConcurrentModificationException`

.

Another thing to consider when writing a custom `List`

class (as with any class!) is how you are going to define its type and the constructors. Since a `SortedList`

needs a way of comparing the elements it stores, I've decided to leave the type definition simple but only supply a constructor which takes a `Comparator`

, this constructor has the following signature:

If you're not used to Java generics then this line might look a bit odd! The type definition, "`<? super T>>`

" just says that the given comparator must be capable of comparing elements of type `T`

, which is the type of element that the list is able to store.

This might seem like a weird decision, since Java's built in `TreeSet`

doesn't enforce the same restriction; it also allows a no-argument constructor. The reason is that this no-argument constructor is used to create a `TreeSet`

which orders elements by their *natural order*; i.e. the order you get if you use the `compareTo`

method on the set's elements. The draw back of this design decision is that there is no way to say that either the elements must be `Comparable`

, or you need to supply a `Comparator`

, so you need to add a runtime check for this (i.e. note the `ClassCastException`

thrown by the `TreeSet`

's `add`

method).

If you need this behaviour with this `SortedList`

, you can you implement a very simple subclass of it which passes in a `Comparator`

providing this natural ordering. An example implementation of this is shown below:

Note that the type definition restricts the class so that only comparable objects can be stored in it; removing the need for any runtime check.

**AVL Trees as Sorted Lists**In order to obtain the logarithmic run times for the standard operations, the SortedList needs to be based on some kind of balanced tree structure. I've used an AVL tree, which is pretty much the simplest form of balanced tree you can have (so less chance of mistakes!) and since it ensures the tree remains very balanced (more so than say a Red-Black Tree), it means that that the `get`

and `contains`

methods always run efficiently.

An AVL tree is a binary search tree, which rebalances itself whenever the height of any node's subtree becomes at least two larger than it's other subtree. The rebalancing requires implementing just two methods - the left and the right tree rotations. I won't go into all the details here, but if you're interested, a pretty easy to follow introduction to AVL trees is available at: lecture1, lecture2 and lecture3.

When it comes to actually implementing an AVL tree, the most obvious way to do it in Java is to have an inner `Node`

class to represent the individual positions in the tree, and then have the main class hold a reference to the root `Node`

of the tree. The `Node`

class needs to be defined somewhat recursively, as each Node needs to maintain references to both the children nodes and their parent node. In order to use an AVL tree as a `SortedList`

, you need this Node class to be slightly different than in a standard implementation, the changes required are that:

- Allow more than one node to store values that are equal in terms of the given comparator.
- Nodes need to remember the total number of children they have

The first alteration is required so that the tree can be used as a `List`

rather than as a `Set`

and the second in order to implement the `get(int index)`

method efficiently.

The start of the Node class in the implementation looks like this:

As the `Node`

class is an inner class there is no need to "parameterise" the type definition, it automatically inherits the definition of the `T`

from the `SortedList`

parent class. The list allows multiple values to be stored by giving each node a unique id, which is incremented as each element is added. The `Node`

's `compareTo`

method then uses this when comparing values, so that nodes with the same value according to the comparator, are distinguished by their unique id. The `height`

and `numChildren`

fields are really just cached values, since their values could be obtained by examining the child nodes. It's up to the implementation to ensure that these values are maintained as changes are made to the tree. In the given implementation, this is all done in the `updateCachedValues`

method of the `Node`

class:

So long as this method is called on the appropriate node each time the tree is structurally altered, the values will remain correct. It's not always obvious which node constitutes the "appropriate one", but it should always be the node which was altered with the lowest height in the resulting tree (it's always the case that there is one such node).

The only key method that is missing from a standard AVL tree implementation that is required to make it work as a `List`

is the `get(int index)`

method. As I mentioned before, this method is going to make use of the `numChildren`

field of the `Node`

class to so that it can be implemented efficiently. Once this field is in place, it's not difficult to implement - the method just needs to traverse the tree, making sure that it remembers how many smaller values there are than those at the current node; this effectively tells you the index of the first value stored at the current node. In the provided implementation, the code look like this:

Here the `sizeOfSubTree`

method just returns one plus the number of children values of the node. The `totalSmallerElements`

variable effectively stores the index of the current node is maintained in lines as the tree is traversed.

**Doing without recursion**You might have noticed that the code so far has been iterative rather than recursive. Generally, most operations involving trees are written using recursion, but since iterative solutions tend to be quicker, I've stuck to using iteration throughout the implementation (the only exception is with the `structurallyEqualTo`

method which is just there for testing). For methods where you just need to traverse the tree, like the `get`

or `contains`

methods, turning it from a recursive method to a iterative one is just a case of adding a while loop and keeping a reference to the current Node that you're looking at. For example, you go from something like:

to something like:

The only difficulty comes when the method needs to go back to nodes that have previously been visited, (i.e. those that can't be written with just simple tail recursion). For instance, if you want to print all the elements in the tree in order; with recursion this is just a few lines:

This could then be invoked on the root node to print the whole tree. It's really not obvious how to do this without using recursion! To overcome this, the `Node`

class in the implementation defines a couple of handy iterative methods - the `smallestNodeInSubTree`

, which finds the smallest node in the sub-tree rooted at the node and `successor`

, which find the next largest node in the tree (so returns `null`

for the node storing the largest values in the tree). They are defined like this:

With these in place, you could write an iterative version of the printAll method like this:

**Unit Tests**I always find that when writing code like this, it's really easy to make a mistake, so to build up some confidence in it I wrote some junit tests for the SortedList class which are included in the download. If you find a problem with it that's not covered by them please let me know.

Mark Rhodes