**
Winner MR Keenan
**
(Santa Claus)

The problem is so notoriously difficult that even dramatic simplifications are difficult to solve. One stripped-down version of the protein-folding problem is the so-called HP lattice model introduced by Lau and Dill in 1989 (See the references at the bottom of this page). Our contest is based on a version of this problem.

This contest concerns a simplified two-dimensional version of the protein folding problem. Given the amino acid sequence of a simplified protein, your job is to determine a final "optimally folded" configuration. Each potential arrangement of the protein sequence is associated with a free energy level, which we call the "result" of your folding algorithm. An optimal folding (and there may be more than one) corresponds to the configuration with the lowest possible result, which can be thought of as a minimum energy configuration. In our idealized world, there are only two kinds of amino acids: hydrophobic amino acids (denoted by 1) and hydrophilic amino acids (denoted by 0). Since proteins exist in a watery environment, the hydrophobic amino acids prefer to cling to each other, compact and apart from the water as much as possible.

The sequence for the protein shown in the diagram below is

[0 1 1 1 1 0]where hydrophobic amino acids are shown in red and hydrophilic amino acids are shown in blue.

Think of the amino acids that make up the protein as links in a chain. Each link may bend by a multiple of 90 degrees, but the distance between links may not change (and is assumed to be 1). No self-intersections are permitted, and the final result is constrained to lie in a single plane. Thus you can think of the chain as a line snaking through a Manhattan-style grid, one amino acid at a time. This is sometimes referred to as a self-avoiding walk.

We will refer to the complex plane to simplify the configuration of any given chain. The folding sequence your function returns will define the turn taken at each link by means of the complex numbers 1 (east), i (north), -1 (west), and -i (south). If you are given the amino acid sequence a

>> a = [0 1 1 1 1 0];you might return the configuration sequence

>> c = [1 1 1 1 1];which defines the straight line configuration shown in the diagram above. The configuration sequence is necessarily one element shorter than the sequence itself, since each element defines the relationship between two amino acids. If we assume the first amino acid is always anchored at 0, then the position of each amino acid can be succintly calculated as

>> p = [0 cumsum(c)] p = 0 1 2 3 4 5So c can be interpreted as: "from your starting location, go east five consecutive times."

To introduce a small turn in the configuration, we can put a northward turn (given by i) in the fourth element of the configuration vector. We expect to see a picture like the one below.

And indeed, the following MATLAB code confirms our expectation.

>> c = [1 1 i 1 1]; >> p = [0 cumsum(c)] p = 0 1 2 2+1i 3+1i 4+1i >> plot(p,'.-') >> axis([-2 6 -2 3])

Complex arithmetic makes the folding calculations straightforward. If we want to fold the entire right half of the original sequence up by 90 degrees, we can multiply the original c vector by a fold vector that bends it like so

>> c = [1 1 1 1 1]; >> f = [1 1 i i i]; >> c2 = c.*f c2 = 1 1 i i i >> p2 = [0 cumsum(c2)] p2 = 0 1 2 2+1i 2+2i 2+3ithereby making the protein shown below.

By applying successive folding vectors, your code should try to minimize the result. Here is one final configuration vector that tries to minimize the fold energy by bringing the four red amino acids close together. We apply yet another 90 degree turn to the last two elements.

>> f2 = [1 1 1 i i]; >> c3 = c2.*f2 c3 = 1 1 i -1 -1 >> p3 = [0 cumsum(c3)] p3 = 0 1 2 2+1i 1+1i 1i

- Free energy level (which we call the "result")
- Runtime of your code

The free energy (result) is calculated by totaling the distance between all pairs of hydrophobic amino acids. Your best configuration will have the hydrophobic amino acids in the most compact mass. The code to do this calculation is shown below. Suppose you have the last sequence shown above. We ignore the location of the hydrophilic amino acids, since they do not contribute to the result, and then total up the distance matrix for the remaining elements.

The first line removes the hydrophilic elements

>> p3 = p3(logical(a)); >> [x,y] = meshgrid(1:length(p3)); >> dist = abs(p3(x) - p3(y)) dist = 0 1.0000 1.4142 1.0000 1.0000 0 1.0000 1.4142 1.4142 1.0000 0 1.0000 1.0000 1.4142 1.0000 0Here is the distance matrix. The result is the sum of all remaining distances: 1 + 1 + 1 + 1 + sqrt(2) + sqrt(2)

>> result = sum(sum(dist))/2 result = 6.8284This result, along with the associated runtime, gets passed to our scoring algorithm before returning a final score, according to the equation

score = k_{1} * result + k_{2}* e ^{(k3*runtime)}

The files you need to get started on the contest are included in a ZIP-file available here at the MATLAB Central File Exchange. If you download and uncompress this zip-file, you will have the files described below.

The routine that does the algorithmic work is `solver.m`, and the answer it returns is very simple: a list of ones. This means, take the amino acid vector a and string it out eastward.

function c = solver(a) % Direction: 1 = east, i = north, -1 = west, -i = south c = [ones(length(a)-1,1)];

A few points to keep in mind about this function and your contest entries:

- The function must have the right signature: one input argument, one output argument. Variable names are unimportant.

function c = solver(a)

- The function must return a configuration vector c of the appropriate length (length(a)-1) containing only 1, -1, i, and -i values.

To test this function with the test suite in the zip-file, run `runcontest.m`.

>> [output, message] = runcontest

The first column of `results` contains your result, and the second column gives you an estimate of the runtime. `message` returns a string describing the overall statistics of your submission.

It's sometimes useful to visualize what your entry is doing; you can do this by passing `runcontest` an input of 1.

>> [output, message] = runcontest(1)This provides a plot of the resulting "protein" as each test is completed.

We also encourage you to discuss your solutions and strategies with others. You can do this by posting to the comp.soft-sys.matlab thread that we've started from our newsreader.

The following are prohibited:

MEX-files

Java commands or object creation

eval, feval, etc.

Shell escape such as !, dos, unix

Handle Graphics commands

ActiveX commands

File I/O commands

Debugging commands

Printing commands

Simulink commands

Benchmark commands such as tic, toc, and
flops

Lau K.F. and Dill K.A. (1990) Theory for protein mutability and biogenesis. *Proceedings of the National Academy of Sciences USA* 87: 638-642.

Contests are divided into segments where some or all of the scores and code may be hidden for some users. Here are the segments for this contest:

**Darkness**- You can't see the code or scores for any of the entries.**Twilight**- You can see scores but no code.**Daylight**- You can see scores and code for all entries.**Finish**- Contest end time.