Path: news.mathworks.com!not-for-mail
From: "Tim Davis" <davis@cise.ufl.edu>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Speeding up sum of squares
Date: Thu, 8 May 2008 01:20:04 +0000 (UTC)
Organization: University of Florida
Lines: 57
Message-ID: <fvtkg4$foh$1@fred.mathworks.com>
References: <fm012450tjat7jgv5hqhtiokid3ketdsgj@4ax.com> <fvq0qr$7tv$1@fred.mathworks.com> <1v2124dbkuukhhprui7a8ss4vhmtgn3jcg@4ax.com> <fvq421$p13$1@fred.mathworks.com>
Reply-To: "Tim Davis" <davis@cise.ufl.edu>
NNTP-Posting-Host: webapp-03-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1210209604 16145 172.30.248.38 (8 May 2008 01:20:04 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Thu, 8 May 2008 01:20:04 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 45902
Xref: news.mathworks.com comp.soft-sys.matlab:467279


"John D'Errico" <woodchips@rochester.rr.com> wrote in
message <fvq421$p13$1@fred.mathworks.com>...

> The product x'*x uses blas routines to speed
> up the inner product. They are clearly much
> more highly optimized than is the code
> matlab produces for sum(x.^2). Bruno
> pointed out that norm(x)^2 also is fairly fast.
> I assume that it too is well optimized.
> 
> John

I'd hazard a guess that it's not a BLAS performance
difference.  There really isn't a lot of benefit in
performance of the level-1 BLAS as compared with plain code,
given modern optimizing compilers.

It's probably because MATLAB is computing z=x.^2, and
storing z as a new vector, internally.  Then it sums it up.
 The creation of y takes more memory traffic (8*n bytes
written then read back in).  I would be quite surprised that
MATLAB is smart enough not to form the vector z,
but when it does x'*x it knows not to do that.

Just a guess, but it's backed up with this experiment:

>> x = rand (1e6,1);
>> tic;y=x'*x;toc
Elapsed time is 0.004829 seconds.

>> tic;y=sum(x.^2);toc
Elapsed time is 0.017520 seconds.

>> tic;z=x.^2;y=sum(z);toc
Elapsed time is 0.018008 seconds.

And, try this:


>> clear
>> x=rand(200e6,1);
>> tic;y=x'*x;toc
Elapsed time is 0.538544 seconds.

when the above command was working, the memory usage of
MATLAB didn't go up (I was watching it).

>> 
>> tic;y=sum(x.^2);toc
??? Out of memory. Type HELP MEMORY for your options.

It's cool what you can learn about how MATLAB works
internally just by looking at error messages.