Lists with Fast Insertions and Removals in Java

This post describes the implementation of a List in Java which allows log time removals and insertions.

Lists are probably the most useful data structures in programming. Java provides a number of ways of creating a list out of the box with regular arrays, and the ArrayList, LinkedList and Vector classes in the java.util package. However, none of these structures allow elements to be removed or added to the middle of the list in sub-linear time*, which for some applications can be useful. (* - the LinkedList allows elements to be removed (inserted) via a ListIterator in constant time, but requires that the element (position) is located first, which takes linear time).

Balanced search trees and skip lists allow this behaviour, but these work on the principle that the data is stored in some fixed order - there is no way to preserve the order in which the elements were added. They also do not usually provide the ability to access elements by their index in the data structure, which is a key feature of a list.

In this post I'll go through the implementation of an "unsorted" balanced search tree (UnsortedList), which allows the elements to be located, added and removed by index in logarithmic time, but function exactly like an ArrayList. The trick is that the list compares the elements in it on their index in the list itself. The UnsortedList extends the SortedList class I wrote for a previous post, which provides a relatively efficient implementation of an AVL tree that allows elements to quickly found by their index. I had to make a few tweaks to that code so that the required inner class and methods are exposed to overriding classes; the latest version, together with all the source code described in this blog is available here: data structures source.

The signature of the UnsortedList class and it's constructor are a bit odd and look like this:

public class UnsortedList<T> extends SortedList<T> {

    //a dummy comparator which works on any object and simply returns 0..
   private static Comparator<Object> DUMMY_COMPARATOR = new Comparator<Object>(){
        @Override
        public int compare(Object a, Object b){
            return 0;
       }
    };

    public UnsortedList(){
        super(DUMMY_COMPARATOR);
    }
    ...

Firstly, it's a bit weird that an UnsortedList is a SortedList, but nevermind (perhaps I should have come up with a better name)! The SortedList requires that a Comparator be provided, which it uses to determine where values in the list should be located. However, since the actual values of the elements stored at each index is not important in an UnsortedList, we simply provide a dummy implementation which returns 0, no matter what's put in; it doesn't matter as it won't be used anyway!

So how are comparisons made? The SortedList class doesn't call the given Comparator directly, but calls the compare method on it's inner Node class (which implements Comparable); the implementation of the Node's compare method then uses this Comparator. This mechanism is used to enable multiple equal elements to be stored, for more details see the previous blog post). So, to get the UnsortedList to work, it's this inner node's compare method that needs to be replaced; which is done by overriding the class itself. The UnsortedList's inner node class (UnsortedNode) which extends the SortedList's Node class is where most of the logic for the UnsortedList lies. The start of the class looks like this:

private class UnsortedNode extends Node {
    //"cached" index of this node in the list and the modCount when it was set..
    private int indexInList;
    private int modCountWhenIndexSet;

    //Constructs a new Node object which should be positioned at the
    //given index when inserted into the tree..
    UnsortedNode(T obj, int initialIndex){
        super(obj);
        indexInList = initialIndex;
        modCountWhenIndexSet = modCount;
    }
    ...

Note that each UnsortedNode stores its index in the list itself; which is what is used when comparing them. However, the problem is that each time an element is added or removed from the list, this value might become out of date. These index values can't be kept up to date all the time because that would require linear time when inserting or removing (which is what happens in a regular list). To tell whether the value is out of date or not, the list also stores the modcount at the time the index variable is set, this variable is incremented each time the list is altered (see AbstractList, which is the subclass of SortedList).

The code to get the correct index for a given UnsortedNode is the only tricky bit. This works by first figuring out the correct index for the parent node (if there is one) and uses the fact that each node knows the size of the sub-tree rooted at that point (i.e. the number of elements in the whole list is the size of the sub-tree rooted at the root node). The code for the method looks like this:

private int getIndexInList(){
    //go up the tree till we find a fresh parent or get to the root and store the path on a stack..
    LinkedList<UnsortedNode> path = new LinkedList<UnsortedNode>();
    UnsortedNode current = this;
    while(current != null && current.modCountWhenIndexSet != modCount){
        path.push(current);
        current = (UnsortedNode) current.getParent();
    }
    //pop elements off the stack updating them as you go..
    while(!path.isEmpty()){
        current = path.pop();
        UnsortedNode parent = (UnsortedNode) current.getParent(); //parent is fresh or null..
        if(parent == null){ //root case..
            Node leftChild = current.getLeftChild();
            current.indexInList = (leftChild == null) ? 0 : leftChild.sizeOfSubTree();
        } else { //non-root case..
            if(current.isLeftChildOfParent()){
                Node rightChild = current.getRightChild();
                current.indexInList = parent.indexInList - (rightChild == null ? 1 : 1 + rightChild.sizeOfSubTree());
            } else { //current is right child of parent case..
                Node leftChild = current.getLeftChild();
                current.indexInList = parent.indexInList + (leftChild == null ? 1 : 1 + leftChild.sizeOfSubTree());
            }
       }
       //register it as being fresh..
       current.modCountWhenIndexSet = modCount;
    }
    //on the last iteration of the above loop this is set correctly..
    return indexInList;
}

The code is complicated by the fact that it uses iteration over recursion for efficiency. It works by simply returning the cached value if it's fresh and otherwise calculating it from it's closest "fresh" ancestor. The important lines in this method, where the actual calculation of the index takes place, are 15, 19 and 22. Each deals with one of the three possible cases: when the node is the root, when it's the left child of its parent and when it's the right child of its parent respectively.

The runtime of the above method is actually constant when it's being used in the program. This is because the Node class's sizeOfSubTree method takes constant time and when it's called within the program, the parent node is always either null or fresh. This is due to the fact that it's only called in the UnsortedNode's compare method which in turn is only called when the tree is searching for an element. These searches (like all binary tree searches) work from the root down; so when the compare method is called on a non-root node, it must be that it was just called on its parent node.

With the ability to get the index of each UnsortedNode in place, it's fairly straight forward to implement the required compare method. The code for this method look like this:

//Determines whether this node is actually in the tree already..
private boolean isInTree(){
    return getParent() != null || getRoot() == this;
}

public int compareTo(Node other){
    UnsortedNode otherUS = (UnsortedNode) other;
    int cmp = getIndexInList() - otherUS.getIndexInList();
    if(cmp == 0){ //indices are equal..
        if(isInTree()){
            if(!otherUS.isInTree()){
                cmp = 1; //treat this as larger..
            }
        } else {
            if(otherUS.isInTree()){
                cmp = -1; //treat this as smaller..
            }
        }
    }
    return cmp;
}

This method essentially just returns the difference in the indices of the nodes; the only slight complication is that nodes that are already in the tree need to be considered as smaller than those already in the list with the same index. This is so that if you insert an element at some given index in the middle of the list, the newly added element gets the required index and the element that was at that index previously is moved up.

That's pretty much it, the only other methods that are required to be overridden in the UnsortedList class are straight forward. These are the add method (so that the UnsortedNode class is used instead of the base Node class), and the contains and remove(Object) methods (since these use binary search, which doesn't work in an UnsortedList!).

PerformanceI've tested the UnsortedList empirically and the results are as you'd expect; the performance is vastly better than the ArrayList when elements need to be added and/or removed from arbitrary positions in the list; however, it's not as quick or as memory efficient in carrying out other operations. The asymptotic running times for various operations across the different list implementations are given below:

Adding To By Index Removing From By Index
Front Middle End Front Middle End Get by Index
LinkedList O(1) O(n) O(1) O(1) O(n) O(1) O(n)
ArrayList/Vector O(n) O(n) O(1)* O(n) O(n) O(1)* O(1)
UnsortedList O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n))
(* amortized constant time - running the method n times takes O(n) time).

I've written a number of unit tests for both the UnsortedList and SortedList classes, the source for which is included in: mr-scottlogic-utils.zip. If you spot a bug not covered in the unit tests let me know!

In the next post I'll show how to adapt this data structure to improve the runtime of all the method in the above table.

MORE BY MARK

blog comments powered by Disqus