In my last post back in June, I introduced a new data structure, the
PatchWorkArray, which performs insertions and deletions lazily in an attempt to improve performance over the lifetime of the list. In this post, I'll detail a simple extension to this class which makes it more practically usable and show how it compares with other list implementations over a number of performance tests. A zip file containing all the source code, unit test code, javadoc and a distributable jar file for the classes mentioned in this post are available at: scottlogic-utils-mr.zip.
Making the PatchWorkArray More PracticalAs I mentioned in my previous post, the
PatchWorkArray works by maintaining a store of the alterations that have been made to an underlying backing list. In general, the larger the number of alterations, the worse the
PatchWorkArray's performance. Although the implementation performs some local optimization, which attempts to minimise the number of alterations, there may be times when it is faster to ensure the list has no alterations prior to performing some operation. For this the
fix method is supplied. For example, if you are given a list of indices and you need to return the elements in the list at those indices, as opposed to writing:
It is likely to be more efficient to write:
Line 4 performs a quick check to see if making a call to
fix is likely to have any positive effect on the performance - although it must be said that I picked the values of 10 arbitrarily! The clear problem with this approach however is that making an explicit call to
fix is not always possible or indeed desirable. Since it's generally better to write methods so that they are not tied to a concrete implementation of an interface, the above method should really be re-written to have the signature:
so without adding some form of type checking hack, you won't be able to call the
fix method on the list anyway. To get around this problem, I've written a new simple sub-class of the
AutoFixingPatchWorkArray, which as the name suggests, automatically fixes itself if the number of alterations to its backing list breaches some set limits.
AutoFixingPatchWorkArrayThis class is probably a lot simpler and shorter than you might think since there are only two points within the
PatchWorkArray code where the number of alterations to the backing list can increase; in the
remove methods which take an index as a parameter. So by overriding just these two methods so that they make a call to
fix before returning, we are pretty much done. The
AutoFixingPatchWorkArray determines whether
fix needs to be run by calling its aptly named
isFixRequired method - which is non-final and protected and so can be overridden if required. The provided implementation of this method checks whether the list's performance hit (which is essentially the number of alterations to the backing list, and the important metric when it comes to the performance, see the previous post for more details), is more than both a set fixed limit and a set fraction of the total size of the list. The core elements of the source code are given below.
It make it easier to use, the
AutoFixingPatchWorkArray also has a default constructor that sets
minPerformanceHitBeforeFix to 10 and
maxPeformanceHitFactor to 0.1 (10%), which, after some testing, seem to be relatively sensible default values.
PerformanceTo test the performance of the
PatchWorkArray, I set up a simple test rig that allows you to compare the time it takes for different list implementations to perform a series of operations, where the chance that an individual operation alters the state of the list (adds to it, or removes from it) can be set. In general, operations on lists are either performed by accessing the elements via their indices (by-index) or via an iterator. The difference is that an iterator maintains a position in the list, whereas when you perform an operation by index you first need to locate the element in the list - which can take a significant amount of time, depending on the implementation. Due to this, there are two separate test methods - one for each use case.
The "by index" test method looks like this:
This code randomly performs
add operations on the given list; the probability that
remove is called is controlled by the given
chanceOfAlteration. The iterator based test method is similar, except that operations are performed on the elements of the list as they are presented by the iterator rather than by randomly jumping to a position in the list.
The results of running these methods on the
AutoFixingPatchWorkArray (using the default constructor), the
UnsortedList and Java's standard
LinkedList are given in the charts below. In each case 100,000 operations were performed on lists with 100,000 elements initially, and the times given are the result of running the tests 1,000 times on my work desktop. To make the chart easier to interpret, I've obmitted the results which took too long to process (including all tests involving running the "by index" method on a
LinkedList!). I'm always a bit dubious of software performance tests when the person writing the test is the same person writing the thing to be tested, so I've included all the test code in the
StandardListPerformanceTests class in the download; so feel free to run them yourself!
The results show, as you'd expect, that data structure choice can have a significant effect on application performance. Whilst the
UnsortedList works well in the case that there are many alterations, the
LinkedList when used via an iterator and the
ArrayList when no alterations are made, the
PatchWorkArray offers all-round good performance across all cases tested. Due to the relatively large size of the lists compared to the number of operations, the difference between the
AutoFixingPatchWorkArray is negligible. When I tested it over more operations the latter performed better; however, it really depends on how the list is to be used as to which is the most efficient, as there is a trade off between the time it takes to fix the list and the time saved per individual operation when their are fewer alterations to the backing list.
As I mentioned in the previous post, it's likely that with further development the performance of the
PatchWorkArray could be improved, even so however, it demonstrates that the effect of lazily performing alterations to a backing list is potentially a useful tool in providing fast data access and manipulation.