Path: news.mathworks.com!not-for-mail
From: "Tim Davis" <davis@cise.ufl.edu>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Speeding up sum of squares
Date: Thu, 8 May 2008 11:58:03 +0000 (UTC)
Organization: University of Florida
Lines: 49
Message-ID: <fvupsb$4fo$1@fred.mathworks.com>
References: <fm012450tjat7jgv5hqhtiokid3ketdsgj@4ax.com> <fvq0qr$7tv$1@fred.mathworks.com> <1v2124dbkuukhhprui7a8ss4vhmtgn3jcg@4ax.com> <fvq421$p13$1@fred.mathworks.com> <fvtkg4$foh$1@fred.mathworks.com> <fvtllj$n9b$1@fred.mathworks.com> <fvtvo2$n92$1@fred.mathworks.com> <fvu0m3$hjm$1@fred.mathworks.com>
Reply-To: "Tim Davis" <davis@cise.ufl.edu>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1210247883 4600 172.30.248.37 (8 May 2008 11:58:03 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Thu, 8 May 2008 11:58:03 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 45902
Xref: news.mathworks.com comp.soft-sys.matlab:467367


It's likely to use loop unrolling as well, such as:

#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const
mxArray *prhs[])
{
    double w, x, y, z;
    register double sumsquares = 0.0;
    double *dp;
    register int i, n;

    dp = mxGetPr( prhs[0] );
    n = mxGetNumberOfElements( prhs[0] );
    for( i=0; i<n; i += 4 )
    {
        w = dp [i] ;
        x = dp [i+1] ;
        y = dp [i+2] ;
        z = dp [i+3] ;
        sumsquares += w * w + x * x + y * y + z * z ;
    }
    plhs[0] = mxCreateDoubleMatrix( 1, 1, mxREAL );
    *mxGetPr( plhs[0] ) = sumsquares;
}

which only works if n is divisible by 4 (I didn't include
the clean up).

On my linux laptop using gcc (MATLAB 7.6, gcc 4.2.3):

the code above:
>> tic;f=s3(x);toc
Elapsed time is 0.524779 seconds.

>> tic;g=x'*x;toc
Elapsed time is 0.543561 seconds.

your code:
>> tic;e=s1(x);toc
Elapsed time is 0.888125 seconds.

Loop unrolling is a common trick.

Don't forget that when you're timing things you need to run
the code at least twice; the first time, the code itself
gets loaded into memory.

gcc is smart enough not to need "register" statements.  It
ignores them.  Visual C++ is probably not as smart.