Rank: 4 based on 3724 downloads (last 30 days) and 52 files submitted
photo

John D'Errico

E-mail
Company/University
Retired

Personal Profile:

Mainly retired from Eastman Kodak. (Of course, Kodak itself is now semi-retired. I don't think I had any influence in that.) I still write MATLAB code as I find something interesting, but I DON'T answer your questions, and I do NOT do homework. Your homework is YOUR problem, not mine. Do NOT e-mail me with your homework problems or student projects. When I'm not doing something with MATLAB, you might find me playing bridge.

Professional Interests:
Bridge, MATLAB, numerical analysis, mathematical modeling

 

Watch this Author's files

 

Files Posted by John View all
Updated   File Tags Downloads
(last 30 days)
Comments Rating
23 May 2014 Screenshot A suite of minimal bounding objects Suite of tools to compute minimal bounding circles, rectangles, triangles, spheres, incircles, etc. Author: John D'Errico miminum, bound, bounding, circle, rectangle, triangle 91 16
  • 4.66667
4.7 | 6 ratings
29 Apr 2014 polyfitn Polynomial modeling in 1 or n dimensions Author: John D'Errico polyfit, modeling, regression, linear regression, approximation, function 274 42
  • 4.95652
5.0 | 23 ratings
29 Apr 2014 Screenshot SLM - Shape Language Modeling Least squares spline modeling using shape primitives Author: John D'Errico spline, splines, cubic, hermite, breaks, knots 187 100
  • 5.0
5.0 | 52 ratings
29 Jan 2014 Moving window standard deviation A (fast) windowed std on a time series Author: John D'Errico standard deviation, std, window, filter, movingavg 78 16
  • 4.72727
4.7 | 11 ratings
30 Oct 2013 HPF - a big decimal class High precision floating point arithmetic, a new class written in MATLAB Author: John D'Errico hpf, multiple precision, big decimal, floating point, arithmetic, mathematics 40 17
  • 5.0
5.0 | 7 ratings
Comments and Ratings by John View all
Updated File Comments Rating
02 Jul 2014 SLM - Shape Language Modeling Least squares spline modeling using shape primitives Author: John D'Errico

Mustafa - I'm glad you like it.

The initial knot placement is as you supply it, or if you supplied nothing, then the default is used. By default, the knots chosen are equally spaced from min(x) to max(x), with 6 knots in total, so there would be 4 interior knots to vary. The optimizer then varies only those interior knots.

There is no overt randomness in the optimization or the initial values though. For example:

x = rand(100);
y = exp(x);

clear functions
tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc
Elapsed time is 1.369497 seconds.

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc
Elapsed time is 0.176209 seconds.

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc
Elapsed time is 0.127645 seconds.

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc
Elapsed time is 0.123504 seconds.

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc
Elapsed time is 0.121428 seconds.

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc
Elapsed time is 0.123522 seconds.

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc
Elapsed time is 0.126947 seconds.

The difference in time for the subsequent calls is small, but non-zero. That variability is purely due to your CPU as it is constantly running various other things in the background.

ALWAYS do such time tests multiple times, and ignore the first couple of calls from your sample, as the first time a function is called, MATLAB takes extra time to cache the JIT parsed code. Above, see that I did a clear functions first to clear the function cache. The first call was quite slow, then after the second call, the time needed was far less, and quite consistent.

02 Jul 2014 SLM - Shape Language Modeling Least squares spline modeling using shape primitives Author: John D'Errico

I'm confused. If you found this page, you should be able to click on the "Download Submission" button. Once you have done that, unzip the file, and add the resulting directory to your search path using either addpath & savepath or pathtools.

26 Jun 2014 Plane Fitting and Normal Calculation Given a set of x,y,z coordinates, find the best planar fit. Author: Dan Couture

Yeah, with low noise, any method works reasonably well. With higher noise, the points on the high end that have error that brings them in have less leverage than those points that get moved out. This tends to introduce a bias.

Of course, it is your choice which method you choose. Definitely use backslash for any linear regressions instead of the normal equations and avoid det, and I will be happier.

You might offer both methods as an option, because whenever I make a choice that limits my code, I always find someone who would have preferred I made the alternative choice.

Anyway, I'm always willing to review submissions when they have been improved.

26 Jun 2014 Plane Fitting and Normal Calculation Given a set of x,y,z coordinates, find the best planar fit. Author: Dan Couture

Hi Dan,

The logic behind better error checks is that when it does throw an error, you can tell the user explicitly what you expected, and why their call failed. It seems a more friendly way to handle it, rather than some obscure error that they will then need to figure out.

The H1 line is not too bad as it is. An H1 line is a big help to help someone find your code when they cannot remember the name or where they put it a few months after downloading it. So some other words I might have considered are: regression, least squares, total least squares. The last one only if you use the SVD approach I suggested.

Really, the issues I have are the code. DET is just a bad thing to use. It is never a better choice than rank or cond to determine the singularity status of a matrix. The fact is, it is never even a remotely good choice.

But the fact, is, you never need to compute the determinant, or even to use rank or cond! This is because SVD does not care. But first, let me expand on the issues here.

One of the problems with errors in variables estimates when you do a basic fit with a simple linear regression is that they are biased estimators. For example, consider the simple linear regression estimator to fit a straight line to some data, where the data has errors in both variables.

x = randn(100,1);
y = 2*x + 3;

Now, I'll add noise to both variables.

x = x + randn(size(x))/2;
y = y + randn(size(x))/2;

What happens when I do a simple linear regression fit to this data? I'll use the model

y = a*x + b + error

So now a simple linear regression fit:

ab = [x,ones(size(x))]\y
ab =
1.59690079069019
3.01312808997413

See that the larger the noise is, the more bias we will see. The slope will be biased low. A way to understand why this happens is because the points in x that are near the ends of the data have more leverage than the points near the center. So imagine one data point near the top end of the set. If that point has a positive error in x, then it has MORE leverage than it deserves. It tends to drag the slope down. Likewise, the points near the bottom end with negative errors in x also tend to bias the slope to be lower than it should have been.

So when you fit regression models to data where the errors are in the independent variables, you tend to get biased results.

mu = [mean(x),mean(y)];
[U,S,V] = svd([x-mu(1),y-mu(2)]);

mu
mu =
-0.062208325462791 2.91378756585509

normal_vector = V(:,2)
normal_vector =
-0.905522722912969
0.424297770779298

mu*normal_vector
ans =
1.29264462097767

So the model is...

-.905523*x + 0.424298*y = 1.29264

If we scale it so that the coefficient of y was 1 to see how we did compared to the biased estimator, we did quite nicely:

y = 2.13417*x + 3.04655

The error in the slope term is considerably lower than it was before.

You also asked about a case where singularity becomes important. One of the things that people do not appreciate about the normal equations (the scheme you used) is they are only an issue when the problem is ill-conditioned. Here ill-conditioning will occur when the points all fall VERY close to a straight line, and you are trying to fit a plane to that line. Clearly the plane will be poorly defined. But use of the normal equations will be worse.

The example I showed in my last response was not designed to show any difference. And, the fact is, it takes a some effort to make double precision fail. Here is an example of how forming X'*X causes issues though

x = linspace(0,1,100)';
xyz = [x,x,x] + randn(100,3)/10000;

cond(xyz)
ans =
10944.616334719

cond(xyz'*xyz)
ans =
119784627.10645

What does the condition number tell us? One thing it tells us is that any noise in the data when you solve a linear system may be essentially amplified by that condition number. So a condition number of 1e8 is far more a problem than is 1e4. The time to seriously worry is when you start seeing condition numbers that verge on 1/eps.

You may think that nobody will ever throw data that bad at your code, but believe me, I've seen some incredibly bad sets of data passed into my various codes on the file exchange. People don't realize how degenerate their data is. Hey, computers can solve anything, right?

I'll stop here for now, because my guess is I will overload the site limits on the size of my response if I go any further.

25 Jun 2014 Plane Fitting and Normal Calculation Given a set of x,y,z coordinates, find the best planar fit. Author: Dan Couture

To finish my comments...

Better is to use tools like rank or cond to tell if a matrix is numerically singular. LEARN TO USE THEM! Learn about the condition number of a matrix. Small numbers are good here. Bad condition numbers in double precision will be on the order of 1e15 or larger.

In this case, see that scaling the data array does not change the condition number at all.

cond(data)
ans =
3.9491

cond(data*10000)
ans =
3.9491

Next, Dan uses what are known as the normal equations, which try to solve the problem

A*x = b

using the form

x = inv(A'*A)*A'*b

This form squares the condition number of A. So if A was nearly singular before we started, then A'*A is definitely singular. It is simply bad linear algebra. Again, yeah, I know your teacher or mentor taught you that. I had people teach me that trick too, 35 years ago. I'm sorry, but it is the wrong thing to do. Use backslash here:

x = A\b;

The backslash operator uses good numerical methods to solve the system, while never explicitly squaring the condition number.

Regardless, it is STILL wrong to do that in the first place, because the approach of solving three different problems is a bad one. Instead, use SVD. The general idea is that IF our data falls on a plane in 3 dimensions, then the SVD will tell us that, and will do it intelligently. In fact, SVD will give us the plane that fits the data that minimizes the total sum of squares, using a nice trick.

Lets first create some fake data that falls on a plane, adding some noise to it. Here the expected normal vector will be [1;2;3]. Of course, I can scale a normal vector by any arbitrary non-zero factor.

N = [1;2;3];
N = N/norm(N);
data = randn(10,3);
data = data - (data*N)*N';
data = data + randn(size(data))/100;

You can even plot the data, then rotate it around to see that it falls roughly on a plane.

plot3(data(:,1),data(:,2),data(:,3),'o')

The added noise is pretty small. Now lets do the fit.

Subtract off the mean of our data. This centers the problem, so that the plane passes through that mean point. This will always be our goal for any orthogonal regression.

mu = mean(data,1);

Next, compute the SVD. We want to use the third singular vector to give us the normal to the plane. This is the vector in the direction of least variability. So we can think of it in terms of an ellipsoid that would contain the data. The ellipsoid will have a very small axis length in the direction perpendicular to the plane of interest.

[U,S,V] = svd(bsxfun(@minus,data,mu));
normal_vector = V(:,3)
normal_vector =
-0.26191
-0.53173
-0.8054

A simple test will show that we did indeed recover the original normal vector to within some noise.

normal_vector/normal_vector(1)
ans =
1
2.0302
3.0751

We can also look at the singular values of the mean subtracted data matrix, to see that the third singular value is significantly smaller than those before. This is a good thing if the data really does fall nearly on a plane.

diag(S)
ans =
4.1098
1.6883
0.019637

So the fitted plane passes through the point mu in 3-d. It has the given normal vector to the plane. And it took me about 3 lines of code to compute, though a bit more in terms of comments.

We can write the fitted plane in a simple form. A plane equation is generally (in any number of dimensions)

dot(X - P,N) = 0

Here P is a vector of length 3 that defines a point on the plane, and N is the normal vector. We can expand that to be

dot(X,N) = dot(P,N)

In our case, the point on the plane is mu, and the computed normal vector is normal_vector.

format long g
normal_vector
normal_vector =
-0.261906153571535
-0.531730910491868
-0.805398910819262

dot(mu,normal_vector)
ans =
-0.00553142802240343

So the fitted plane equation, created by a numerically valid method is:

-0.26191*x - 0.53173*y - 0.8054*z = -0.0055314

(Be careful to not use such truncated values in real life. But for display purposes, short numbers are just easier to read.)

In the end, I would give the code 5 stars for the help.

I would give it about 3.5 stars because it has internal comments and at least a default value for the second parameter, although no error checks. Error checks on the inputs make for friendly code.

However, and this is important, I would give it only 1 star for the code itself, and in my opinion, that is a terribly important factor. So replace much of the code in fitNormal with a just few lines, and add some error checks, and my rating would go up to 5 stars.

Comments and Ratings on John's Files View all
Updated File Comment by Comments Rating
21 Jul 2014 interparc Distance based interpolation along a general curve in space Author: John D'Errico Justin

This is amazing. Basically matlab magic.

Running on MATLAB 7.8(R2009a)

21 Jul 2014 inpaint_nans Interpolates (& extrapolates) NaN elements in a 2d array. Author: John D'Errico Ivy

21 Jul 2014 inpaint_nans Interpolates (& extrapolates) NaN elements in a 2d array. Author: John D'Errico Ivy

Hi John, your code is very useful for my work. I want to cite it in my paper, could you please tell me how to cite it properly? Thanks!

10 Jul 2014 polyfitn Polynomial modeling in 1 or n dimensions Author: John D'Errico akshay

works great and should be included in standard package

08 Jul 2014 IPDM: Inter-Point Distance Matrix An efficient and accurate Inter-Point Distance Matrix Author: John D'Errico Greene, Chad

For an array of points given by x and y, this function quite elegantly and efficiently returns indices of each point's nearest neighbor like this:

ipdm([x y],'subset','nearestneighbor','result','struct');

Takes ~5 seconds to solve for 15,000 points on my laptop. That's quite fast.

Contact us