Molecule Contest Analysis
There were 1631 entries in the Molecule contest, with more than a hundred participants (the exact number is hard to determine since people don't always report their names consistently). The contest got so busy on the last day that when the queue shut down at 5 PM there were still 159 entries were waiting in the queue to be processed.
This area plot shows how the entries added up as the contest wore on. The green area represents the number of files that passed the test suite, and the red area shows those that failed. Overall 977 of 1631 entries passed the test suite, for a pass rate of 59.9% (the highest pass rate so far for a MATLAB contest).
This histogram shows when contest activity was occurring. Obviously there was a huge amount of action just as the contest was drawing to a close. Also you can see that the weekend was predictably quiet. Given the light activity, it's still an open question as to whether running across the weekend was better than simply running from Monday to Friday. Tell us what you think.
Some people entered the contest once or twice, while others entered it literally dozens or even hundreds of times. Let's look at when two of our most active participants, Yi Cao and Stijn Helsen, were submitting their entries. They both had very busy weeks (in addition to their normal jobs!). You can learn more about Yi Cao and Stijn Helsen (as well as Bert Jagers and Paulo Uribe) on our contest winners page.
This is the most useful diagram for visualizing the contest: here we see the score of the entries logged against the time at which they were submitted. The result is a scatter plot of all passing entries in the contest. Since a lower score is better, the lowest part of this data set decreases steadily over time as the lead entry gets better. All leaders are shown connected by a red line. Therefore the winning entry by Paulo Uribe is found in the lower right corner of the plot: the lowest score at the end of the contest.
Since the score never went below 7800, we are doing a log plot in which 7800 has been subtracted in order to exaggerate the difference between the final entries.
Lower scores are better scores in this contest, except for the period of 3-6 PM on May 20th when we ran a "high score" mini-contest. You can see the resulting cluster of high scores in the top center of this plot. Curiously, someone managed to top Bert Jagers' winning high score a day later. Who was it? It was Yi Cao, with the aptly named "Three Refiners Inverse".
Two numbers determine the final score for each entry: the strain in the molecule and the length of time required by the calculation. Here is a scatter plot that shows these two values for each entry with the red leader line superimposed. CPU time varied to a surprising degree as the strain decreased monotonically among the leaders. Notice the winning entry actually took substantially less time than the bulk of the field.
Unfortunately, this plot doesn't show the order in which the entries arrived. In order to get a feel for the trends in the contest, let's show a day-by-day version of the strain vs. CPU time plot. Here we are looking at each of the seven days of the contest. All entries for all days are shown in light blue for context, while the entries for the day in question appear as red dots.
What points deserve special credit for making significant improvements? Here is the score vs. submission time plot with certain entries highlighted. We call these "corner points" because they represent a significant step change in the score plot. Often these corner points (particularly early in the contest) indicate big changes to the code. As the contest progresses, it becomes harder and harder to make big improvements. The glory associated with making those big improvements rises correspondingly.
Here is a plot of the percent improvement generated by each new leader relative to the previous leader. The largest gains are made, as is typically the case, near the beginning of the contest. This plot lets us see who is responsible for the biggest changes over the contest. The upper frontier of this plot is a sort of hall of fame, and someone whose name appears there more than once managed to make several significant improvements to the score. Yi Cao and MR Keenan stand out in this view as making an impressive series of score reductions over the last three days of the contest.
For those of us running the contest, trying to get the right balance into the scoring algorithm is tricky. Scoring in this contest was dominated by getting the lowest strain before the three-minute timeout. Score was directly correlated with strain in a way that CPU time was not, as these two plots show.
Because so many people tried to use as much time as possible, there were a large number of timeouts in this contest, tending to bog down the queue. In retrospect our scoring algorithm may have been too lenient with respect to time.
Here's another way to look at the same information. Since some of the entries are related to one another, we can show this relationship with a line. By coloring all connected family members, we can see "clans" emerge. For instance, shown in red here are all the descendants of Paulo Uribe's entry "Truncate1", ID number 6814.
This plot shows nine different clans of related entries, along with their progenitor, or oldest direct ancestor. Remarkably, these families account for almost all the leaders throughout the entire contest. Of course this family tree analysis depends on people properly attributing their code, but by and large it appears that they do.
Notice the orange clan creeps down from a relatively poor score intially until it leads the pack.
How similar are the entries to one another? Using a standard "diff" algorithm, we can determine how many lines in one entry are identical to other entries. Here we are looking at a plot over time that shows how many lines of code are in each of the leading entries (the length of the line segment) and how much of that code is identical (the green sub-segment) or different from (the red sub-segment) the preceding leader. In regions where you see entries of more or less the same length with small red tips, there are very few differences from one entry to the next. In other places you can see the code getting shorter or longer.
We are also showing here in the lower plot the score vs. submission time for reference. Leaders are shown in red.
Those are the comparisons from one leader to the next. The next plot shows a comparison of all the ancestors of the winning entry. When you follow a single ancestry line, you tend to see fewer big changes and more tweaking. The author (or authors, in this case) has typically decided on a set of parameters to tweak systematically and then peppers the parameter space with entries. For instance, the only difference between entry 7487 and 7488 shown here (FF17 and FF30 by Alex) was in a single line.
alpha = alpha/2.5;
alpha = alpha/3;
These entries are particularly interesting, because they start off with a "Optimized Code" by Yi Cao. Fourteen tightly clustered submissions by Yi Cao follow, before Alex picks up the thread, submitting three variations of his own between 10:30 and 11:00 AM. Finally Paulo Uribe, making two variations on Alex's "FF30" code (and only just before the contest ends), comes away with significant improvements and the prize. At the last minute, Paulo plucked the laurels from MR Keenan's "Really hopeless" (another spawn of Yi Cao's "Optimized Code") which so nearly won the contest.
Who got the biggest improvement in score for the smallest change in code? Here we can see that entry 6173 (Enhanced Method R5 by mystery author "It's me") uncovered a full 0.3% improvement in score by changing a single parameter t from 0.55 to 0.59. This entry showed the highest ratio of percent improved to percent code changed.
In the preceding plots, we were comparing each leader only to its immediate predecessor and immediate follower. What happens if you compare the entire population of leading entries to each other? This image is a similarity matrix of all leaders. We have taken all 91 leaders and scored them relative to each other for (91^2 - 91)/2 = 4095 different file-to-file comparisons. Each pixel in this image represents a single comparison between two files. A high score (bright yellow) indicates a large amount of shared code between file 1 and file 2. The highest possible score (colored white) means the entries are identical. So the main diagonal here is white, since every entry is necessarily identical to itself.
Blocks of bright yellow indicate stretches of the contest where the leaders resembled each other to a large degree. The big blocks marching down the main diagonal are another indication of the breaks occurring between one prevailing algorithm and the next, a perfect picture of punctuated equilibrium evolution.
So there you have it: nail-biting suspense, intense competition combined with open-source camaraderie, and high-stakes high-speed coding. Another successful MATLAB programming contest. We look forward to seeing you next time.
This contest analysis was calculated and published entirely from MATLAB. We used the Database Toolbox to pull the information directly from the contest database, and this HTML document was automatically generated from a MATLAB script.