Darkness 1999-06-03 00:00:00 UTC
 
Twilight 1999-06-04 00:00:00 UTC
 
Daylight 1999-06-05 00:00:00 UTC
 
Finish 1999-06-10 00:00:00 UTC

Mars Surveyor - Statistics

In Praise of Tweaking:
The MATLAB Programming Contest, Take Two

This report is in direct contrast to the painstaking line-by-line analysis provided by the companion report entitled "Strategies, Randomizations, and Optimizations" All the images in this report come directly from the raw data in our contest database rather than from the careful inspection of individual entries.

Problem complexity and tweak-bombing

Despite the fact that there were more entries in the second contest than the first (1647 vs. 1455), there were actually fewer participants. This probably resulted from the fact that the second contest was significantly more complex. Once contestants were inside the door, they were likelier to submit more entries per person. Another hallmark of this second contest was a conviction that submitting multiple entries differing by small and random amounts might lead to success. For example, on June 17th between midnight and 2:30 AM Natick time, Roger Stuckey submitted 161 entries. We affectionately named this practice "tweak-bombing," and it shows up dramatically in some of the plots below.

Here is an area plot of all 1647 entries as they came in. Interestingly, the number of failures per 100 entries is significantly down from the first contest.

pass-fail.gif (106562 bytes)

Score vs. Time

Now we move on to a more interesting plot: the score of each entry plotted against when it was submitted. The leader at any given time is shown in red, and since lower scores are better, the red line naturally marks the lowest extremity of the set of all entries. During the contest, the lead changed hands a total of 85 times. Murray Simpson has the distinction of leading for the longest period of time with his entry "RandomRandv1.16" which ruled the roost from 2:30 on the afternoon of June 15th until Martin Leach's breakthrough entry "SoupDragon31" bettered it at 10:52 on the morning of June 17th. Martin's "SoupDragon34" then shattered the 200 point barrier only 15 minutes later.

time-score.gif (36408 bytes)

Notice the tweak bombing runs tend to show up as thick vertical stripes. In general, each of these stripes is attributable to a single person.

Broadly speaking, based on the contest data we can divide the contest into two parts: an initial tweaking battle based on code initially appearing in "Lolomove" by Anders Holtsberg, and a second climactic tweaking battle set off by Martin Leach's soupy innovations of the 17th and finally ending with Paulo Uribe's winning entry "NoSoup4U !1." Keep in mind this is a transparent analysis based only on how code is attributed in the database. We don't want to overemphasize some people's contributions at the expense of others. But if you observe the inheritance patterns as reported by the various authors, some definite trends appear. For example, here is a plot similar to the one shown above, except for the fact that it connects related entries with light blue lines. Furthermore, the lineage of Anders Holtsberg's "Lolomov4" is called out in red: it consumes almost the entire center of the contest!

Lolomov4.gif (46618 bytes)

There are other ways to spot the influence of an algorithm. Below are shown all the entries that have the word "Soup" in their title. Notice that the original SoupDragon is a middling performer lost amid the last great tweaking binge of the Lolomov4 era. But after a day of tuning, Lolomov4 has waned and the SoupDragon is in the ascendant, leading to such names as "SoupedUpDragon," Colin Sinclair's "SneakyGreenSoup," Roger Stuckey's SoupMix series, and of course Peter J. Acklam's inimitable "Can't think of a name." (The original Soup Dragon, by the way, is from a British children's television program of the 1960s.)

SoupDragon.gif (52816 bytes)

To close out the section on legacy charts, here is a zoomed-in view of what a tweak cluster-bomb looks like. This is from Murray Simpson's raid on the afternoon of June 15th. Visually, it resembles exploding fireworks; any entry may be a launching point for still more entries. The root of this small inheritance tree is Murray's "RandomRandv1.10," and in fairness to the bombing technique, one of the offspring of that entry, "RandomRandv1.16" (the low entry submitted at 14:38 below) went on to a spectacular and unequalled run as contest leader.

simpson.gif (67520 bytes)

Percent unexplored vs. CPU time

Another way to look at the progress of the contest is to view the entries on a plane where CPU time is plotted against % unexplored as seen below. The segmented red line again shows the progress of the leading entry from the beginning of the contest in the top center to the winner in the bottom left. Light gray curves show lines of constant score (the log scale on the y axis gently curves these lines).

result-v-cpu.gif (36242 bytes)

In this light, the significant changes made to the algorithm by Anders Holtsberg and Martin Leach become more apparent. Tweaking skirmishes tend to work left along horizontal lines by speeding up code that still ultimately explores the same area. A significant algorithmic departure is signified by a shift in both CPU time and % unexplored. Notice that SoupDragon31 moves almost along a line of constant score. It's not very much better than the previous leader, but because the algorithm is new, it has a lot of breathing room for subsequent tweak exploitation, leading to a much better SoupDragon34 in only a few minutes. This image is nothing less than a quantitative snapshot of innovation at work.

If we take a time-lapse camera approach to the above plot, we can see how the entries moved over time. The leader in each of the subplots below is circled in black. The light blue backdrop displays all entries for context.

multi.gif (69702 bytes)

Like bees in a hive, the red dots show the entries crawling toward a reward in the lower left.

Tweaking, innovation, and open sources

The programming contest achieves something remarkable: it turns MATLAB coding into a highly entertaining full-contact sport. The contest also serves as a model case for how (and why) open source coding works. It would seem the the contest is successful because it is

  • competitive
  • real-time
  • open-sourced
  • personal, and
  • tweak-friendly

This last feature is a crucial one. Tweaking, or the unrestrained (and sometimes capricious) modification of a predecessor's code, is in large part what sustains the contest. To the hardworking contestant who has just seen his code speed-tweaked by a half percent to take someone else into first place, tweaking can seem rude and unfair. On the other hand, tweak-battles make for a fabulous spectator sport, and assuming that they don't cause all the contestants to pack up and go home, they certainly generate streamlined code. All evidence indicates that tweaking doesn't drive people away, but rather it draws them in closer and energizes the contest. It has an appealingly egalitarian effect: no optimization is too small to be worthy of consideration. Tweaking, cluster bombs and all, fuels the engines of innovation.

What do you think about tweaking? About the contest in general? Let us know.