MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

# Thread Subject: Fitting a cdf to noisy data

 Subject: Fitting a cdf to noisy data From: Katya Frois-Moniz Date: 18 Mar, 2009 16:20:17 Message: 1 of 6 Hello - I'm totally new to MATLAB, so apologies for this probably very stupid question! I have a time-series data set showing the measured concentration of viruses over time. I believe that the data are best modelled as a normal cdf -- i.e. that the time it takes for a new virus to be produced and released is normally distributed. I'm most interested in knowing what the mu and sigma of the underlying gaussian are. What is the best way of approaching this? I first tried to fit the data with the curve fitting toolbox, but there was no normal cdf option, and my attempts to specify my own function didn't work. I tried generating the pdf by plotting the *incremental* values (i.e. y(t) - y(t-1)) vs time, but that of course increased the noise. I tried ecdf, but my data aren't always non-decreasing, so that didn't work. As a last-ditch attempt, just to get some sense of my results, I first smoothed the data, then used the smoothed data to generate the incremental values, fit with a pdf, etc... but that's obviously not a theoretically sound approach! I would very much appreciate any help or guidance anyone can offer! Thanks!!
 Subject: Fitting a cdf to noisy data From: Peter Perkins Date: 18 Mar, 2009 17:11:28 Message: 2 of 6 Katya Frois-Moniz wrote: > I have a time-series data set showing the measured concentration of viruses over time. I believe that the data are best modelled as a normal cdf -- i.e. that the time it takes for a new virus to be produced and released is normally distributed. I'm most interested in knowing what the mu and sigma of the underlying gaussian are. What is the best way of approaching this? I first tried to fit the data with the curve fitting toolbox, but there was no normal cdf option, and my attempts to specify my own function didn't work. I tried generating the pdf by plotting the *incremental* values (i.e. y(t) - y(t-1)) vs time, but that of course increased the noise. I tried ecdf, but my data aren't always non-decreasing, so that didn't work. As a last-ditch attempt, just to get some sense of my results, I first smoothed the data, then used the smoothed data to generate the incremental values, fit with > a pdf, etc... but that's obviously not a theoretically sound approach! Katya, you may find this demo helpful in clarifying your goals. From your description, it's not clear to me whether you want to fit a curve to observations of concentration vs. time, or if you want to fit a normal distribution to observed times. It's also not clear to me if the concentrations you have are cumulative, or if they include both "births" and "deaths" if you see what I mean. You may want to use the Curve Fitting Toolbox. You may want to use NORMFIT in the Statistics Toolbox. You may want to fit a "discrete normal" using MLE in the statistics Toolbox. Hope this helps.
 Subject: Fitting a cdf to noisy data From: Katya Frois-Moniz Date: 18 Mar, 2009 18:09:01 Message: 3 of 6 Thanks for you post, Peter! > From your description, it's not clear to me whether you want to fit a curve to observations of concentration vs. time, or if you want to fit a normal distribution to observed times. I'm really looking to fit a curve (normal cdf) to concentration vs. time, and obtain the parameters. >It's also not clear to me if the concentrations you have are cumulative, or if they include both "births" and "deaths" if you see what I mean. Technically, they include deaths, but these are assumed to be negligible, so the concentration is (essentially) cumulative. > > You may want to use the Curve Fitting Toolbox. You may want to use NORMFIT in the Statistics Toolbox. You may want to fit a "discrete normal" using MLE in the statistics Toolbox. I think I'll try cftool again, and see if I can get help setting up the custom equation, since what I tried before didn't work. Thanks !
 Subject: Fitting a cdf to noisy data From: TideMan Date: 18 Mar, 2009 19:13:34 Message: 4 of 6 On Mar 19, 7:09=A0am, "Katya Frois-Moniz" wrote: > Thanks for you post, Peter! > > > =A0From your description, it's not clear to me whether you want to fit = a curve to observations of concentration vs. time, or if you want to fit a = normal distribution to observed times. =A0 > > I'm really looking to fit a curve (normal cdf) to concentration vs. time,=  and obtain the parameters. > > >It's also not clear to me if the concentrations you have are cumulative,=  or if they include both "births" and "deaths" if you see what I mean. > > Technically, they include deaths, but these are assumed to be negligible,=  so the concentration is (essentially) cumulative. > > > > > You may want to use the Curve Fitting Toolbox. =A0You may want to use N= ORMFIT in the Statistics Toolbox. =A0You may want to fit a "discrete normal= " using MLE in the statistics Toolbox. > > I think I'll try cftool again, and see if I can get help setting up the c= ustom equation, since what I tried before didn't work. > > Thanks ! I don't completely understand your problem, but the way I generate a CDF from data is to first calculate the histogram (the empirical PDF), then integrate to give the CDF. This gives the probability that the data exceed a particular value. You say >I tried generating the pdf by plotting the *incremental* values (i.e. y(t)=  - y(t-1)) vs time Well, that's not a PDF as I know it. It's simply a gradient vs time.
 Subject: Fitting a cdf to noisy data From: Katya Frois-Moniz Date: 18 Mar, 2009 20:13:01 Message: 5 of 6 OK -- I think I've got it sorted. I just used a Custom Equation in cftool: y = a*(0.5*erf((x-b)/(c*sqrt(2)))+.5)+d where b=mu, and c=sigma Thanks to all for your help!
 Subject: Fitting a cdf to noisy data From: Roger Stafford Date: 18 Mar, 2009 20:32:01 Message: 6 of 6 "Katya Frois-Moniz" wrote in message ... > Hello - > > I'm totally new to MATLAB, so apologies for this probably very stupid question! > > I have a time-series data set showing the measured concentration of viruses over time. I believe that the data are best modelled as a normal cdf -- i.e. that the time it takes for a new virus to be produced and released is normally distributed. I'm most interested in knowing what the mu and sigma of the underlying gaussian are. What is the best way of approaching this? I first tried to fit the data with the curve fitting toolbox, but there was no normal cdf option, and my attempts to specify my own function didn't work. I tried generating the pdf by plotting the *incremental* values (i.e. y(t) - y(t-1)) vs time, but that of course increased the noise. I tried ecdf, but my data aren't always non-decreasing, so that didn't work. As a last-ditch attempt, just to get some sense of my results, I first smoothed the data, then used the smoothed data to generate the incremental values, fit with > a pdf, etc... but that's obviously not a theoretically sound approach! > > I would very much appreciate any help or guidance anyone can offer! Thanks!!   It is a little difficult to envision an occasionally decreasing concentration as a cumulative process. Nevertheless I would think the best way to proceed would be to calculate the mean and variance of time as a random variable, considering relative concentration as your probability measure. There is presumably an eventual approximate leveling off of concentration and the measured concentration divided by this "final" level would be your probability measure. The relative concentration at any particular time gives the probability that any given virus will have appeared by that time or sooner.   To get the mean, find the integral of the measured time with respect to this relative concentration. For the variance, find the integral of the square of the difference between the time and the above mean value with respect to this relative concentration.   Given these two parameters, the cdf curve of a corresponding normal distribution is uniquely determined and you can use the Statistics Toolbox 'normcdf' to generate it or you can manufacture it with a well-known formula using the 'erf' function in the regular matlab package. Either way it will generate the normal cdf curve of relative concentration in terms of time and you can compare the two curves to see how well they match. Roger Stafford