Gene Splicing is the 16th MATLAB Online Programming Contest.
We can think of genes as linear sequences stored in a database. Physically, this database is contained in long strands of DNA. Put another way, DNA is the "filing cabinet" in which genes are stored. When needed, these sequences are retrieved from the DNA database and used to create functional proteins. But the DNA database is far from static. Sometimes the physical locations of the genes move around inside the DNA filing cabinet. Similar in concept to disk defragmentation in a computer, this phenomenon is known as transposition. Thus two organisms that are very similar physically might have substantially different DNA because the order of their genes has been scrambled over time. Working out how one genome transforms into another one is of great interest biologically, because it reveals how two organisms are related. Since it is both computationally non-trivial and biologically significant, it is an active area of bioinformatics research.
For the purposes of this contest, you will be given two strands of "DNA": the test sequence and the target sequence. In practice, each strand will be a vector of positive (nonzero) numbers. Your job is to transform the test sequence into the target sequence as efficiently as possible using gene transposition. We can think of the target sequence as the genome of a related individual. The relevant question is: how many transposition events are required to make the two sequences identical?
a = [ 1 5 6 8 5 4 3 2 7 ] t = [ 1 2 3 4 5 6 6 7 8 ]There are few things to notice right away. First of all, the starting vector has two fives and one six, whereas the target vector has two sixes and one five. So it's not possible to get a perfect alignment in this case. Second, nothing prevents us from simply returning a as the answer vector b. In that case, the score would be 21, as calculated by this code.
b = a; score = sum(abs(b-t));But clearly we can do better. We can see that the fragment [5 4 3 2] can be flipped and inserted earlier in the sequence. Suppose we have the following gene splicing function. Using our splicing function, we want to take a fragment of length 4 from position 5 of vector a and re-insert it at position 2. Thus
b = splice(a, 4, 5, 2, true)Now
b = [ 1 2 3 4 5 5 6 8 7 ]which corresponds to a score of 3. Much better!
Here's what that process looks like algorithmically.
This represents one transposition event. With one more, we can get very close to the target vector. The following code swaps the last two elements. Note that there are a several different commands that will have the same result. For the purposes of scoring, it is unimportant which one we choose.
b = splice(a, 1, 8, 9, false)The score has now been reduced to 1, and it is impossible to do any better. Thus it has taken us two transpositions to get from the original vector to the best possible match. If we have a transposition budget of two or more, then it doesn't matter if we make our biggest improvement in the first or the second move. If, on the other hand, we are constrained to a budget of exactly one transposition, then clearly we should make the move with the best improvement in score.
The test suite will include many cases in which you are not given enough moves to make a perfect match with the target, so choose your moves accordingly.
moves = solver(sequence, target, budget)The input vector is sequence, the goal is target, and budget is the maximum number of transpositions you are allowed. You want to get the best possible score given a limited number of transpositions. Your answer must be returned in the form of the four column matrix moves in which each row is a set of transposition instructions ordered like so:
[len, ai, bi, flip]where len, ai, bi, and flip are defined as in the splice function described above. The transpositions will be processed from top to bottom. If there are more than budget rows in the moves matrix, they will be ignored.
Since a file can contain multiple functions, the complexity for any given file is defined as the MAXIMUM complexity of any functions contained in it. A good practice is to keep the complexity for each function below 10, so for this contest your overall score will increase according to the complexity in excess of 10. So there is no complexity penalty for submissions in which all functions have a complexity of 10 or less.
You can measure the cyclomatic (or McCabe) complexity of any function in MATLAB using the "cyc" switch for mlint. Try this, for example:
>> mlint -cyc magic.mWe have also included a getComplexity.m function with the contest distribution to make it easier to find the complexity for your entry.
The code is limited in size by the database architecture. The column in our MySQL database that stores the M-code is of type text, which is limited of 65535 characters. Submissions longer than this limit will fail.
We also encourage you to discuss your solutions and strategies with others. You can do this by posting to the thread that we've started from our newsreader.
The following are prohibited:
Extraction of puzzles in the test suite by manipulating the score, runtime, or error conditions is also forbidden. In the small scale, this has been an element of many past contests, but in the Blockbuster Contest, Alan Chalker turned this into a science. Tuning the entry to the contest test suite via tweak bombing or other techniques is still allowed, but we ask that you not overwhelm the queue.
Contests are divided into segments where some or all of the scores and code may be hidden for some users. Here are the segments for this contest: