In my last post back in June, I introduced a new data structure, the PatchWorkArray
, which performs insertions and deletions lazily in an attempt to improve performance over the lifetime of the list. In this post, I'll detail a simple extension to this class which makes it more practically usable and show how it compares with other list implementations over a number of performance tests. A zip file containing all the source code, unit test code, javadoc and a distributable jar file for the classes mentioned in this post are available at: scottlogic-utils-mr.zip.
Making the PatchWorkArray More PracticalAs I mentioned in my previous post, the PatchWorkArray
works by maintaining a store of the alterations that have been made to an underlying backing list. In general, the larger the number of alterations, the worse the PatchWorkArray
's performance. Although the implementation performs some local optimization, which attempts to minimise the number of alterations, there may be times when it is faster to ensure the list has no alterations prior to performing some operation. For this the fix
method is supplied. For example, if you are given a list of indices and you need to return the elements in the list at those indices, as opposed to writing:
It is likely to be more efficient to write:
Line 4 performs a quick check to see if making a call to fix
is likely to have any positive effect on the performance - although it must be said that I picked the values of 10 arbitrarily! The clear problem with this approach however is that making an explicit call to fix
is not always possible or indeed desirable. Since it's generally better to write methods so that they are not tied to a concrete implementation of an interface, the above method should really be re-written to have the signature:
so without adding some form of type checking hack, you won't be able to call the fix
method on the list anyway. To get around this problem, I've written a new simple sub-class of the PatchWorkArray
, the AutoFixingPatchWorkArray
, which as the name suggests, automatically fixes itself if the number of alterations to its backing list breaches some set limits.
AutoFixingPatchWorkArrayThis class is probably a lot simpler and shorter than you might think since there are only two points within the PatchWorkArray
code where the number of alterations to the backing list can increase; in the add
and remove
methods which take an index as a parameter. So by overriding just these two methods so that they make a call to fix
before returning, we are pretty much done. The AutoFixingPatchWorkArray
determines whether fix
needs to be run by calling its aptly named isFixRequired
method - which is non-final and protected and so can be overridden if required. The provided implementation of this method checks whether the list's performance hit (which is essentially the number of alterations to the backing list, and the important metric when it comes to the performance, see the previous post for more details), is more than both a set fixed limit and a set fraction of the total size of the list. The core elements of the source code are given below.
It make it easier to use, the AutoFixingPatchWorkArray
also has a default constructor that sets minPerformanceHitBeforeFix
to 10 and maxPeformanceHitFactor
to 0.1 (10%), which, after some testing, seem to be relatively sensible default values.
PerformanceTo test the performance of the PatchWorkArray
, I set up a simple test rig that allows you to compare the time it takes for different list implementations to perform a series of operations, where the chance that an individual operation alters the state of the list (adds to it, or removes from it) can be set. In general, operations on lists are either performed by accessing the elements via their indices (by-index) or via an iterator. The difference is that an iterator maintains a position in the list, whereas when you perform an operation by index you first need to locate the element in the list - which can take a significant amount of time, depending on the implementation. Due to this, there are two separate test methods - one for each use case.
The "by index" test method looks like this:
This code randomly performs get
, remove
or add
operations on the given list; the probability that add
or remove
is called is controlled by the given chanceOfAlteration
. The iterator based test method is similar, except that operations are performed on the elements of the list as they are presented by the iterator rather than by randomly jumping to a position in the list.
The results of running these methods on the PatchWorkArray
, the AutoFixingPatchWorkArray
(using the default constructor), the UnsortedList
and Java's standard ArrayList
and LinkedList
are given in the charts below. In each case 100,000 operations were performed on lists with 100,000 elements initially, and the times given are the result of running the tests 1,000 times on my work desktop. To make the chart easier to interpret, I've obmitted the results which took too long to process (including all tests involving running the "by index" method on a LinkedList
!). I'm always a bit dubious of software performance tests when the person writing the test is the same person writing the thing to be tested, so I've included all the test code in the StandardListPerformanceTests
class in the download; so feel free to run them yourself!
The results show, as you'd expect, that data structure choice can have a significant effect on application performance. Whilst the UnsortedList
works well in the case that there are many alterations, the LinkedList
when used via an iterator and the ArrayList
when no alterations are made, the PatchWorkArray
offers all-round good performance across all cases tested. Due to the relatively large size of the lists compared to the number of operations, the difference between the PatchWorkArray
and AutoFixingPatchWorkArray
is negligible. When I tested it over more operations the latter performed better; however, it really depends on how the list is to be used as to which is the most efficient, as there is a trade off between the time it takes to fix the list and the time saved per individual operation when their are fewer alterations to the backing list.
As I mentioned in the previous post, it's likely that with further development the performance of the PatchWorkArray
could be improved, even so however, it demonstrates that the effect of lazily performing alterations to a backing list is potentially a useful tool in providing fast data access and manipulation.