Rank: 6 based on 4202 downloads (last 30 days) and 52 files submitted
photo

John D'Errico

E-mail
Company/University
Retired

Personal Profile:

Mainly retired from Eastman Kodak. (Of course, Kodak itself is now semi-retired. I don't think I had any influence in that.) I still write MATLAB code as I find something interesting, but I DON'T answer your questions, and I do NOT do homework. Your homework is YOUR problem, not mine. Do NOT e-mail me with your homework problems or student projects. When I'm not doing something with MATLAB, you might find me playing bridge.

Professional Interests:
Bridge, MATLAB, numerical analysis, mathematical modeling

 

Watch this Author's files

 

Files Posted by John View all
Updated   File Tags Downloads
(last 30 days)
Comments Rating
29 Jan 2014 Moving window standard deviation A (fast) windowed std on a time series Author: John D'Errico standard deviation, std, window, filter, movingavg 103 16
  • 4.72727
4.7 | 11 ratings
30 Oct 2013 HPF - a big decimal class High precision floating point arithmetic, a new class written in MATLAB Author: John D'Errico hpf, multiple precision, big decimal, floating point, arithmetic, mathematics 60 9
  • 5.0
5.0 | 7 ratings
31 Jul 2013 Variable Precision Integer Arithmetic Arithmetic with integers of fully arbitrary size. Arrays and vectors of vpi numbers are supported. Author: John D'Errico integer, biginteger, arithmetic, operator, overloading, long 128 119
  • 4.83333
4.8 | 40 ratings
31 Jul 2013 nearestSPD Finding the nearest positive definite matrix Author: John D'Errico positive definite mat..., covariance matrix, covariance, spd, statistics 58 3
  • 5.0
5.0 | 1 rating
27 Feb 2013 Screenshot distance2curve Find the closest point on a (n-dimensional) curve to any given point or set of points Author: John D'Errico nearest, closest, distance, interpolation, curve, space curve 83 13
  • 5.0
5.0 | 5 ratings
Comments and Ratings by John View all
Updated File Comments Rating
15 Apr 2014 BrokenStickRegression A line consisting of connected straight sections is fitted to a cloud of data points. Author: Peter Nave

This code has a serious flaw with it.

For a model with multiple breaks in it, this tool finds the points in the left most section, then applies polyfit to that, and then sequentially works across the data, solving for each segment, one at a time.

While this may seem logical, it fails to recognize that data in one segment should influence the shape of the curve in another neighboring segment. In fact, a point near one end of a segment SHOULD influence the next line segment.

As well, by solving these as sub-problems in a sequential manner, we would get a different result if we ran the sequences in a reverse order. That is silly.

Finally, it looks like the code will fail if a segment just happens to completely lack any data points, since then the slope in that segment will be undefined. Since this code allows fminsearch to choose the breaks, that can indeed happen.

The fact is, all of this can be surmounted. The entire fit can be written in a single linear algebra operation, thus a call to backslash, solving for all segments at once. This resolves the issue of order of solution. It resolves the issue of neighboring points influencing another interval.

Even better, if you add in a tiny amount of regularization to the linear algebra (still a linear problem) then the process will survive a segment with no points in it.

Wrapping the segmented solver inside an optimizer like fminsearch is still necessary if you will ask it to choose the break placement. Be very careful however, as fminsearch tends to be a terribly slow thing with more than about 6 variables to search over.

I will also point out that my own SLM toolbox allows the equivalent broken stick regression, without the flaws I explain above. It does use fmincon for the break placement.

08 Apr 2014 SLM - Shape Language Modeling Least squares spline modeling using shape primitives Author: John D'Errico

Joseph,

The point is that the knots MUST contain the data. It will throw an error if that is not true, telling you something to that effect. In fact, I don't know where you got the idea that the knots should lie wholly inside the range of the data, as that is not something I have shown in any example in the tutorials. Note that the default for 'knots' is a set of 6 equally spaced points that COMPLETELY span the data.

If the knots were to fall entirely inside the data, with data points that lie outside, then slmengine would be forced to extrapolate, something which I am strongly opposed to doing with a spline, since extrapolation of a cubic polynomial segment will give virtually random crap.

If however, I force you to contain the data inside knots of the spline, then you can control the shape of the spline over the entire knot range. This is how the tool works. It does so for very good reasons. Note that if you DO wish to extrapolate, then since you can control the shape of the spline out as far as the knots go, then you can force the tool to extrapolate intelligently, at least within the bounds of the knots.

08 Apr 2014 binomial.m This program computes the binomial coefficient C(n,m). Author: David Terr

Sorry. I think Herbert completely misunderstands the issue. In fact, he seems to be wrong on all counts.

It DOES have an overflow issue, just as as nchoosek has an issue. For example,

>> nchoosek(1500,265)
Warning: Result may not be exact. Coefficient is greater than 9.007199e+15 and is only accurate to 15 digits
> In nchoosek at 92
ans =
1.58269376513703e+302

nchoosek does return a valid answer, WITH a warning that it is not exact. But since the result has 303 decimal digits and is returned as a double what do you expect???

binomial also returns the same thing, however it generates no warning message. That does not say it has no issue, only that it fails to be friendly and warn you of that fact!

Both codes will return inf when the arguments cause a double to be exceeded.

binomial(1500,275)
ans =
Inf

However, note that nchoosek is both more flexible and more powerful than is binomial. And nchoosek has been around since the dark ages. So the fact that David could not find it is merely a reflection that he did not look that hard. Jos points out the simple use of lookfor that would have given him the answer.

nchoosek also has the ability to give exact results for a slightly larger set of inputs than binomial. So if the input is uint64, so will be the output.

>> b1 = nchoosek(uint64(65),30)
b1 =
3009106305270645216

>> b2 = binomial(65,30)
b2 =
3.00910630527064e+18

>> b1 - b2
ans =
480

Here nchoosek was exact, whereas binomial was wrong. In fact, nchoosek CAN do symbolic computation, whereas binomial fails to return anything of value.

>> b1 = nchoosek(sym(1500),750)
b1 =
722462892448930217028116073228485295944376343040546523665632913653613596406381727723198055046187955623124069093412562720869577867557610868874271130828359513282184417776042792372322074253393127328396528996120053749558122610911178218582669317535346728464707661495135518682519172221470420360910320792434869988224466647627642393919250205687942318888922893189087379790541907686956429837978631252775258630376332505697937034877619012586751274240109457111424

>> binomial(sym(1500),30)
ans =
exp(gammaln(1501) - gammaln(1471) - 5253606334386405/70368744177664)

Of course, nchoosek even works perfectly for vpi input, whereas binomial fails there completely.

There is only one point made by Herbert that MIGHT be correct in some circumstances, i.e., that binomial is faster. Since binomial does no error checking and it has lesser capability than does nchoosek, of course it might be faster. On top of that, the gamma log computation for large input might be be pretty fast. But is it good? Even for double inputs...

>> binomial(101,10)
ans =
19212541264839.3

>> nchoosek(101,10)
ans =
19212541264840

Oops. I'm pretty sure that most binomial coefficients are integers, at least they were when I went to school. You never know with the new math though. But I would think it disappointing to see a floating point result that is not a pure integer, at least for a reasonably small set of inputs.

So how about speed. Is it faster? I'll use timeit to do the comparison.

>> timeit(@() nchoosek(100,10))
ans =
8.32913416666667e-05

>> timeit(@() binomial(100,10))
ans =
0.0002867962125

So here nchoosek was actually 4 times FASTER than binomial. Not slower. binomial is the slow one.

08 Apr 2014 SLM - Shape Language Modeling Least squares spline modeling using shape primitives Author: John D'Errico

Hi Joseph,

I had thought about allowing a user to turn off C2 continuity at specific knots, but it seemed a bit klugy in terms of the interface.

If you have a decent idea of where those breaks occur, then you could add a spare knot or two in those areas. If you are trying to do it semi-automatically, then you might need to do it in two passes, using the first fit to find roughly where those second derivative breaks live. So the first pass would employ lots of knots in the fit.

You can also specify a region or set of regions where the curve will be concave down or up. Make sure those regions don't overlap, else the fit will be too strongly constrained to be useful, and make sure there are a couple of knots between each pair of consecutive regions.

Finally, remember that derivative estimation is an ill-posed process. So it tends to be a noise amplification process.

06 Apr 2014 Uniform Sampling of a Sphere Create an approximately uniform triangular tessellation of a unit sphere Author: Anton Semechko

Manuel - You apparently have NO idea how complex a triangulation of the surface of a 1000 dimensional hyper-sphere will be. The result would be IMMENSE.

In fact, I'm amazed that it succeeded for dimension 525. For example, try computing the convex hull of a fixed number of points on a hypersphere, what happens?

xyz = randn(100,2);
xyz = bsxfun(@rdivide,xyz,sqrt(sum(xyz.^2,2)));
t = convhulln(xyz);size(t)
ans =
100 2

xyz = randn(100,3);
xyz = bsxfun(@rdivide,xyz,sqrt(sum(xyz.^2,2)));
t = convhulln(xyz);size(t)
ans =
196 3

xyz = randn(100,5);
xyz = bsxfun(@rdivide,xyz,sqrt(sum(xyz.^2,2)));
t = convhulln(xyz);size(t)
ans =
1936 5

xyz = randn(100,7);
xyz = bsxfun(@rdivide,xyz,sqrt(sum(xyz.^2,2)));
t = convhulln(xyz);size(t)
ans =
24266 7

xyz = randn(100,9);
xyz = bsxfun(@rdivide,xyz,sqrt(sum(xyz.^2,2)));
t = convhulln(xyz);size(t)
ans =
287356 9

This last one started to take a significant amount of time, as you might expect.

Anyway, that 525 dimensional convex hull is a terribly poor approximation to a hypersphere. You are kidding yourself.

Comments and Ratings on John's Files View all
Updated File Comment by Comments Rating
17 Apr 2014 Adaptive Robust Numerical Differentiation Numerical derivative of an analytically supplied function, also gradient, Jacobian & Hessian Author: John D'Errico Niklas

16 Apr 2014 LSE A linear least squares solver, subject to linear equality constraints Author: John D'Errico Avital, Royi

How large are the matrices it can handle?
For instance:

http://www.mathworks.com/matlabcentral/answers/125984-constrained-weighted-least-squares-image-interpolation

08 Apr 2014 SLM - Shape Language Modeling Least squares spline modeling using shape primitives Author: John D'Errico D'Errico, John

Joseph,

The point is that the knots MUST contain the data. It will throw an error if that is not true, telling you something to that effect. In fact, I don't know where you got the idea that the knots should lie wholly inside the range of the data, as that is not something I have shown in any example in the tutorials. Note that the default for 'knots' is a set of 6 equally spaced points that COMPLETELY span the data.

If the knots were to fall entirely inside the data, with data points that lie outside, then slmengine would be forced to extrapolate, something which I am strongly opposed to doing with a spline, since extrapolation of a cubic polynomial segment will give virtually random crap.

If however, I force you to contain the data inside knots of the spline, then you can control the shape of the spline over the entire knot range. This is how the tool works. It does so for very good reasons. Note that if you DO wish to extrapolate, then since you can control the shape of the spline out as far as the knots go, then you can force the tool to extrapolate intelligently, at least within the bounds of the knots.

08 Apr 2014 inpaint_nans Interpolates (& extrapolates) NaN elements in a 2d array. Author: John D'Errico Tiberius

08 Apr 2014 SLM - Shape Language Modeling Least squares spline modeling using shape primitives Author: John D'Errico Joseph

Hi John,

Greatly appreciate your fast response.

Turning C2 off at specific points is probably what I needed, but I am not complaining that it isnt there. Also, I didn't realize the concave up/down can be applied to more than one region, but other aspects of my simulations don't keep the concavity the same. I do appreciate those suggestions.

I kinda knew I would have to overcome my lazyness and start using custom placed knots for each of intervals to capture the inflections and avoid overfitting certain regions.

However I am a little confused by lines 176-178 of slmengine.m.

if (knots(1)>min(x)) || (knots(end)<max(x))
error(... 'Knots do not contain the data. Data range: ',num2str([min(x),max(x)])])

Suppose my data is just x = 1:10 with the knots = [2, 5, 7] (i.e. knots contained in the data range). That would cause line 176 to evaluate to true and then throw an error.

(2 > 1) || (7 < 10) => true || true => true

(If I am not mistaken or confused; but probably just confused) Shouldn't that knot placement for the data set be acceptable because all knots are interior to the data set?

Or is it that the range of the knots must include the data range as subset?

Sorry if this is a basic question, but I don't understand what the significance of having knots placed outside the range of data would be.

Contact us