Elf White Paper

Introduction: Composing in a Browser

This white paper accompanies the Elf source code and program documentation, and is intended to provide an overview of the design, algorithms and optimisation strategems used by the Elf composing engine.

The ability to run "online" (that is, in a web browser) was a fundamental requirement for the system, and has a large impact on its design. However, it isn't necessarily a good thing - compromises in features and performance are unavoidable - so why do it at all?

Possible client-side technologies include Javascript, VB script, Flash, ActiveX as well as Java. The first three (although in theory suitable) are interpreted languages and are extremely slow. ActiveX controls can use compiled C++ so could potentially provide the fastest solution, however my preference was for the less proprietary Java technology. The advantage of a Java composing engine is that the same code can in theory run on a range of platforms - "write once, run anywhere".

But what about performance? Modern Java VMs use Just In Time Compilers which effectively compile the Java bytecode line by line immediately before running. The resulting performance is still generally short of compiled C code, because of the additional bounds checking and memory management performed by the JVM. However, with careful code optimisation it is possible to alleviate these effects.

Although Java is in theory a "cross-platform" solution, in practice there are two different, slightly incompatible versions of the Java system, the old Java 1.1 and the newer Java 2 which has now reached JDK version 1.4. Unfortunately many modern browsers, including the all-pervasive Internet Explorer 5 and 6, contain 1.1 JVMs, as does the Netscape 4.7 range. Newer browsers based on the Mozilla engine, such as Mozilla itself and Netscape 7, run with a Java 2 "plug in", but have a very much smaller market share; there are also technical problems with communication between web-page scripts and the Java 2 VM in these browsers. I therefore decided to target the older 1.1 JDK, and, initially at least, write specifically for Microsoft's IE5/6. The code is, however, upwardly compatible with Java 2, and it should be possible in the future to run it under Mozilla.

Optimising for Java 1.1

Although Java 1.1 VMs such as that bundled with IE have good JITs (Just In Time compilers), they are much slower than newer Java 2 VMs at tasks such as garbage collection, object creation and string handling. The built in collections classes, such as Vector and Hashtable, are also poorly-performing by comparison with the more extensive range of equivalent classes in Java 2.

However, you can get a 1.1 VM to run as fast as a Java 2 VM, if you try hard enough! The four main VM optimisation strategies employed by Elf are:

  1. Avoid object creation. All the objects required by the main composing loop are pre-allocated. This avoids the twin memory-management overheads of object creation and garbage collection. Static memory allocation is of course slightly more difficult to code, but given the relatively closed nature of a composing engine, not unduly onerous.
  2. Strings are avoided in the inner composing loops. Instead, character arrays are used throughout, which are far quicker. A small amount of extra utility code is required to manipulate character arrays, but the overhead is slight compared with the performance gain.
  3. Collections classes such as Vector and Hashtable are not used by the inner loops. These are slow because they allocate memory dynamically, and all calls to their methods are synchronized. Instead static arrays are used.
  4. Finally, thread synchronization is slow under Java 1.1. Although Elf is a multi-threaded application, thread synchronization is avoided wherever possible. For instance, the main composing loop uses atomic flag access for interrupt notification.

Efficient Composing Algorithms

No matter how well-optimised your code, it will never run well unless you are using the correct algorithm. Design of algorithm has far more impact on run time than does code optimisation, important though the latter is. Exhaustive machine composition falls into the unfortunate class of algorithms that have exponential running time; increase the problem length by one unit and the run time doubles. Ultimately this means that computer composition is an insolubable problem; there are applications, such as one-part peals, that will always be out of the reach of an exhaustive search, no matter how fast computers become in the future. However, given a very restricted search length - in our case, a single part of a quarter peal, or a short touch - an exhaustive search can feasible, if slow.

The standard algorithm used by composing engines is that of a depth-first tree search. The space of all possible compositions is arranged as a tree, with the first calling position at the root, and subsequent calls forming branch nodes. The tree is then searched by progressing down the "leftmost" branch first, in which all calls are set at zero (representing a "Plain" or, in spliced, the first method choice), until a true composition is found, the touch becomes false, or the maximum length limit (tree depth) is reached. At this point we backtrack up the tree, incrementing the last call (from a Plain to a Bob, say, or from Cambridge to Yorkshire), before continuing on down to the roots of the tree again. Repeating this process the entire composition space is eventually searched, in running time of approximately nd, where n is the number of calling types (the number of method/call combinations in spliced), and d is the maximum desired composition length, expressed in terms of the number of nodes from root to leaf of the search tree.

To take an example, let's say we are searching for leadhead, bobs-only 4-spliced touches of no more than 12 leads. A node in the search tree will represent one lead of the touch, since the method can be changed at every lead. We will also assume that a bob can be called at any lead. There are therefore 4x2 = 8 calling combinations. With n = 8 and d = 12, the running time will be of the order 812 = 69 billion operations. Supposing our code can achieve a searching rate of 1 million nodes per second (a reasonable rate for a modern PC and a well-optimised Java implementation) the search time could be up to 69,000 seconds or about 19 hours.

Nineteen hours is quite a long time for one of the shortest spliced searches of any practical use. It would also make it completely impractical to search for more methods or more leads; for instance, 8-spliced over the same distance would require 1612 operations, or at 1 million nodes per second, approximately 9 years!

Fortunately some of the peculiar features of changeringing come to our rescue. One is falseness. Usually in longer searches, some branches will be curtailed because the touch becomes false. For human composers, falseness can slow down the process, but for computers the opposite is true. To see how, let's consider a very false method which has only one true touch of peal length, all the other true compositions being of half-peal length or less. In this case the effective search depth will only be half that of the full peal, since the one branch that leads to the full peal doesn't add appreciably to the search time. Does halving the search depth halve the search time? No, much more than that. Suppose the full peal length can be expressed in 80 nodes (a suitable number for Surprise Max). With bobs alone, the clean-proof search time would be of the order 280, an unimaginable large number: 1.2x1024. However, halving the search space to 40 nodes results in a comparatively tiny 1000 billion operations.

Even where methods have little falseness, many branches of the search tree are curtailed by touches that come round. So in practice the upper bound determined by the nd formula is far too conservative. Nevertheless it's important to remember that the underlying algorithm is still exponential in nature, and unfortunately for the very short lengths of spliced that Elf is designed for, falseness and premature occurrence of rounds have little time to make an effect.

Round Block Searches

Another oddity of changeringing search trees is that the roots and the leaves join up. Most touches, unless they come round at the snap, are round blocks. This means that they consist of one continuous cycle of calls, and can be rotated to any starting position. For instance, the standard Plain Bob Minor touch WH WH can be written by the leads as:

-PPP--PPP-

This same touch can be started at other positions:

PPP--PPP--
PP--PPP--P

...and so on. In a standard tree search all of these touches will be found independently, in separate parts of the search tree. However, it would clearly be much quicker, on finding the first one, simply to generate its rotations, and not bother to trawl through the other parts of the tree. Is this possible?

It turns out the answer is yes. The rotationally-sorted tree search algorithm is a refinement of the standard depth-first tree search which finds each instance of a round-block touch once and once only. If you are just searching for round blocks, it is potentially many thousands of times faster to use this algorithm. The restriction to round blocks is not usually that limiting, especially since all regular multi-parts must by definition be round blocks.

The rotationally-sort search works as follows. A sorting order is defined on the calling types. For instance, we might consider a Plain "P" to be ordered before a Bob "-". Then "PPP" would be less than "P--" and that in turn less than "-PP". Now for round blocks, one rotation and one rotation only will have the "lowest" order. Taking the WH WH example above, the lowest rotation is PPP--PPP--. This is the rotation that the algorithm will find. It will avoid the parts of the search tree that contain all the other rotations by skipping branches that begin "-P", "P--P", or any other prefix that would generate a "non-lowest" rotation. (To see why "-P" and "P--P" prefixes cannot generate "lowest" compositions, consider a touch beginning with one of them, say -P[A] where "[A]" is any sequence of calls. Because it is a round block, it can be rotated to give P[A]-. But this is lower than -P[A], so will already have been found.)

In practical terms, skipping "non-lowest" prefixes is very easy. All we do is start with the lowest call, e.g. "P". The first touch found will be a plain course, and at this point we perform the first backtrack, changing the last P to a bob "-". However, unlike in the standard search, we do not continue on with Plain nodes. Instead, the calls are copied from the beginning of the tree, in a process called regeneration. Consider Plain Bob Minor again. The plain course is five leads long:

PPPPP
At the first backtrack the last Plain is incremented into a Bob:
PPPP-
The regenerative algorithm then adds new calls by copying them from the start of the touch, using a "reg pointer" that in this case is always five leads behind the search point:
PPPP-PPPP-PPPP-
You can see that the next touch, 3H, is found immediately, without any further backtracks. Also the "non-lowest" prefix PPPP-PPPPP is avoided completely. Obviously this prefix is false, so would be rejected in any case, but as more touches are found the regenerative system becomes increasingly effective. By the time the search reaches P- there are no compositions left which contain two or more consecutive Plains. When the prefix "-" is reached, instead of a entire tree-half being left to search, there is only one touch left: a bob course.

Elf employs one or two refinements to the simple rotationally-sorted search as described above. One is an extension to allow tenors-together course-structured round blocks to be handled. In this case, when backtracking from an arbitrary point in the course, the touch cannot be simply copied from the root, because this is a Home. Instead the regeneration proceeds from a point "before" the beginning of the touch. In effective, Plain leads of method zero are copied until the Home occurs, and the start of the touch can then be matched.

A second refinement is used to handle a special case for multi-parts, in which duplicate rotations can be produced by the rotationally-sorted search. This case occurs when the part ends with a "low postfix". For instance, suppose the following part is produced:

PP-PP

This is a natural product of the rotationally-sorted search: after a backtrack in the third lead, producing the bob, the final two plain leads are copied from the root of the touch. However in this case the part end has occurred prematurely, before the bob can be copied again. There is now a "low postfix", PP, from which the part could be started to generate the lower rotation PPPP-. Elf rejects compositions with low postfixes, because an equivalent rotation will already have been found. This is done simply by checking the last backtrack position: to avoid low postfixes, it must be at an integral division of the part. In other words, we must either have backtracked at the very end of the part, halfway through, one-third through, one-quarter through, or so on. This test can be seen at the beginning of Elf's doCompose() methods.

Tables

Composition would be inordinately slow if, for every node, all the changes of the desired lead had to be produced. Instead composing engines prebuild node tables. These are used to allow the algorithm to jump from node to node without intermediate calculation. For instance, in a single-method composition with a call allowed every lead, a node would correspond to an individual leadhead. The node has pointers to the subsequent leadheads reached by a Plain or a Bob - these are the branches of the search tree beneath the node. For tenors-together course-structured compositions, a node can encompass many leads, for example it might represent the jump from a Middle to a Home, or a Home to a Wrong. However in the case of Elf, a node represents a single lead in a particular method, and has branches to the leads produced by any combination of call and method change.

The same scheme is used even when Elf is running in half-lead spliced mode. Instead of halving the size of the nodes, i.e. searching through half-leads, Elf builds composite methods. Given n methods to splice, Elf creates n2 composite methods from each first and second half-lead of every root method. This means that full-lead nodes can then be used, which is not only faster than searching half-lead nodes, but allows the same algorithm to be used as for the leadhead spliced search.

Since Elf is designed for Major methods only, there are a relatively small number of potential nodes: 5040 per method. These can be found and their branches calculated quite quickly, as a node-building pass before composition begins. Since each node represents one particular lead of one method, musical changes in the lead can also be identified at this stage, and a music score precalculated for the node. The effort needed to score compositions produced by the search could have a major impact on overall search time if optimisations such as this were not used. The moral is, if something can be calculated before the search starts, then it should be done, and stored in a table.

Elf's tables are managed by a Tables object. This holds the arrays of composite methods, of music definitions, and of the nodes themselves. Each node is represented by a RowNode instance. RowNode instances contain data about the node, including arrays, indexed per method, of music scores and branch node references. The Tables class contains methods to build the node tables, calculate music scores, etc.

Note that, although RowNode instances are used by Elf to hold the data for leadhead nodes as described above, the same class is also used for other rows not at the leadhead. It's necessary to have a class to represent a general eight-bell row to help with tasks such as proving (see below). The RowNode class is re-used to avoid the space overhead of two separate class files; another important requirement for Elf is that the distribution applet archive is small and quick to download - more of which later. A RowNode used for a general eight-bell row is not populated with branch node tables or music scores for the entire lead. References to the RowNode instances that are used for leadhead nodes are kept in a separate array by the Tables class: see Tables.fAllNodes and Tables.fLeadheadNodes.

Checking Falseness

As mentioned above, touches that run false serve to prune the search tree and reduce running time. However, it does take additional time to check for falseness, and this overhead must be minimised.

Normally the truth of a touch can be checked quickly using tables of false nodes. Each node has a flag indicating whether it has been included in the composition yet or not. Even for clean-proof methods, this enables a quick check to be made to ensure this node hasn't already been visited. For methods and searches with internal falseness, each node is also populated with a table referencing all other nodes which are false against it. Then, when a node is being considered for inclusion into the composition, its false nodes are checked first to ensure none of them have yet been used. If all are clear, the current node can be added to the composition, and its "used" flag is set. If on the other hand at least one false node already appears in the composition, we must backtrack.

In the case of Elf, this standard proof-checking algorithm is not so suitable. In spliced, there are very many false nodes per lead of method; not only those caused by internal falseness within the method, but all those generated by cross-falseness with the other methods to be spliced. Whilst a table of false nodes could be built, it could potentially be very large, and hence excessively time-consuming to check at each step in the search.

Conversely, for the very short touches that Elf is designed for, falseness is not likely to have a significant impact. A touch of twelve or even fifteen leads won't run false very often. Because of this, a design decision was made not to prove on the fly within the search loop. Instead, Elf proof-checks each composition as it is produced. This is done by the simple technique of generating all the changes in the touch and checking them off against a clean table of all 40,320 rows. False nodes are not used because, for many searches, more than 32 false nodes per lead are present, meaning it is quicker to perform the naive check on the 32 rows in the lead. The 32 rows in the lead are however generated efficiently: instead of permuting by method place notation, the rows are generated directly from pre-built row references indexed by an identifying number corresponding to the place notation type. The Tables class is, as usual, responsible for building these place notation references - see Tables.calcPerms()

Four refinements to the truth checking algorithm help performance. Firstly, leadheads are checked for repetition during the search. It's very quick to check a leadhead hasn't been used before, and while this does not prevent the lead being false internally, it is a useful mechanism to prune branches of the touch with too many consecutive plain leads, or which repeat leads for some other reason. Secondly, if a composition product is found to be false, this knowledge is fed back into the search loop to prune the tree. For instance, if the composer has been searching a branch beginning ABC, and there is internal falseness between A and C, this is discovered when a composition such as ABCDEF is proved. The tree is then pruned from ABC.

A third refinement uses a property of multi-part compositions. If one part is false against another, then every part must be false again another part which is in the same relationship. This means that it is only necessary to prove one more than half of the number of parts; i.e. we prove x parts of n, where x = n+1 / 2.

Finally, because truth checking is still slow compared to checking music and other statistics of a composition, it is delayed as long as possible. When a composition of the required number of parts is produced, the following steps are carried out to analyse it (see also Composer.checkComp()):

  1. First, factors independent of touch rotation (other than truth) are checked. This includes method balance and number of COM.
  2. If the composition is still acceptable, every rotation is checked to see which (if any) have acceptable part ends. Depending on composing options, this may include only those rotations with the tenors together at the part ends, or only those with "nice" part ends. See Composition.checkRots().
  3. The "good" rotations are then checked one by one for music. See Composition.calcMusicRots().
  4. When a composition is found which has sufficient music to be output, proof-checking is finally performed (see Composition.isTrue(). If the composition is now found to be false no more rotations are checked. On the other hand if no rotations produce suitably good touches, proof-checking will never be necessary.
Notice how this analysis algorithm is carefully optimised to produce, on average, the fastest response time for any given composition product. If the parameters of the search change substantially - for instance, if a single method and a long search is selected - the system can become sub-optimal, because then it is better to prove earlier in order to gain the benefits of falseness pruning more often. Because of this, Elf is not suitable for single-method searches, although it is possible to specify them.

Pruning

Pruning branches of the search tree, using techniques such as identifying false touches, is by far the most effective way to cut into the otherwise intractable exponential running times of an exhaustive search. We've seen that Elf uses falseness pruning where possible, but that this technique is less effective for the very short touches it focuses on. Are there other factors which could be used to prune the search tree?

Although machine composing engines perform an exhaustive search, and so discover every possible composition in a given search space, we are generally interested only in those tiny fraction of compositions which have desirable features such as good music, simple calling, and so on. Unfortunately to identify the most desirable compositions, we still have to search the entire tree.

Or do we? In spliced compositions, there are factors such as changes of method (COM) and method balance which are important. Suppose we are searching for 10-lead touches with at least 6 COM. The standard tree search will start off down the branch containing just the first method, say Cambridge. By the time we are six nodes into the search, the composition will be CCCCCC; but it is then impossible to achieve the minimum 6 COM in the remaining four leads. Effectively the branches of the tree beginning with any six leads of the same method can be pruned off.

Elf makes extensive use of pruning techniques like this to attack otherwise intractable searches. As the search progresses, a count is kept of the number of COM so far, together with counts of how many times each method has been used (in both the first and second half-leads for half-lead spliced), which are used for method-balance pruning. The COM count is kept in the Composition class - see setLead() and getCOM() methods. The method counts are kept by the main composing loop (Composer.doCompose()), in arrays f1stHalfMethodCounts and f2ndHalfMethodCounts. Whenever a new lead is considered by the composing loop, the following pruning checks are made:

  1. Do the first halflead and second halflead methods already occur too many times in earlier parts of the composition? By "too many times" we mean "if we included these methods, would it no longer be possible to achieve a good method balance, because other methods would perforce be under-represented?" For instance, suppose we are looking for six-lead touches of three-spliced with perfect method balance. We cannot have more than two occurrences of each method per half-lead.
  2. A further pruning check on method balance is made, based on method counts of included methods. Even if we haven't had too many occurrences of the methods under consideration, it's possible the desired method balance may not be achievable if there are too many methods present to maximum allowed number. For instance, suppose we are looking for four-lead touches of three-spliced with perfect balance. Each method could in theory occur twice, but once one method has appeared twice, no other method can. By checking how many methods are at maximum count, we can prune as soon as this kind of imbalance appears.
  3. Finally, the COM count is checked to see whether it is acceptable for this stage in the part. This is done by assuming any remaining leads in the part can have a method change every halflead; but if the resulting COM for the part would still be too low, we must prune, because there is now no way to achieve minimum COM.

Maintenance of the COM and method balance counts is a serious overhead on the performance of the inner composing loop. Not only must be the counts be checked and incremented whenever a new lead is considered, but they must be decremented during backtracking. However the benefits of pruning far outweigh these costs; it is almost always worthwhile spending a little time checking pruning possibilities rather than blindly racing through every corner of the search space.

Heuristic Pruning

What if the user has not specified a minimum number of COM or minimum method balance? In this case we do not have any limits to prune against. The inner composing loop will still be doing all the work of updating method counts etc, but there will be no pruning benefits to offset this. Fortunately Elf's heuristic pruning algorithm comes to the rescue. Elf assumes that the user is only interested in the "best" compositions available. As compositions are produced, their music content, COM and method balance are checked. Once ten compositions have been found, Elf starts to ignore any search paths that only contain worse compositions. As better and better compositions are found, the pruning criteria are therefore tightened up more and more, and the search gets faster.

The code responsible for the heuristic pruning algorithm is as follows:

  1. The inner composing loop uses the COM and method counts to prune - see Composer.doCompose().
  2. When a composition is found, its total COM and method balance number is calculated - see Composer.checkComp().
  3. The composition is then added to Elf's top-ten list - see Elf.outputComp.
  4. Elf rechecks the top-ten list to find the lowest values for score, COM and balance that are now needed to get into the top ten.
  5. These values are fed back into the Composer to tighten down the pruning in the inner loop - in particular see Composer.setMinCOM() and Composer.setRepeatLimits().

Note that the balance pruning is slightly complicated by the fact that the method counts work "backwards": instead of checking for minimum balance, the composing loop is really testing to see whether the composition is too unbalanced. Therefore the heuristic algorithm must calculate method repeat limits based on the maximum "unbalance" of compositions in the top ten. However, this unbalance measure is based on the same data as the balance score presented to the user. Direct balance score is also used by the composition checker as a more fine-grained check on compositions produced by the inner loop.

Mark Davies
March 2003