[task #7882] Implement Monte-Carlo simulation, where errors are generated with width of standard deviation or residuals

Discussion:

Troels E. Linnet

2015-01-16 16:14:30 UTC

URL:
<http://gna.org/task/?7882>

Summary: Implement Monte-Carlo simulation, where errors are
generated with width of standard deviation or residuals
Project: relax
Submitted by: tlinnet
Submitted on: Fri 16 Jan 2015 04:14:30 PM UTC
Should Start On: Fri 16 Jan 2015 12:00:00 AM UTC
Should be Finished on: Fri 16 Jan 2015 12:00:00 AM UTC
Category: relax's source code
Priority: 5 - Normal
Status: In Progress
Percent Complete: 0%
Assigned to: tlinnet
Open/Closed: Open
Discussion Lock: Any
Effort: 0.00

_______________________________________________________

Details:

This is implemented due to strange results.

A relaxation dispersion on data with 61 spins, and a monte carlo simulation
with 500 steps, showed un-expected low errors.

-------
results.read(file=fname_results, dir=dir_results)

# Number of MC
mc_nr = 500

monte_carlo.setup(number=mc_nr)
monte_carlo.create_data()
monte_carlo.initial_values()
minimise.execute(min_algor='simplex', func_tol=1e-25, max_iter=int(1e7),
constraints=True)
monte_carlo.error_analysis()
--------

The kex was 2111 and with error 16.6.

When performing a dx.map, some weird results was found:

i_sort dw_sort pA_sort kex_sort chi2_sort
471 4.50000 0.99375 2125.00000 4664.31083
470 4.50000 0.99375 1750.00000 4665.23872

So, even a small change with chi2, should reflect a larger
deviation with kex.

It seems, that change of R2eff values according to their errors, is not
"enough".

According to the regression book of Graphpad
http://www.graphpad.com/faq/file/Prism4RegressionBook.pdf

Page 33, and 104.
Standard deviation of residuals is:

Sxy = sqrt(SS/(N-p))

where SS is sum of squares. N - p, is the number of degrees of freedom.
In relax, SS is spin.chi2, and is weighted.

The random scatter to each R2eff point should be drawn from a gaussian
distribution with a mean of Zero and SD equal to Sxy.

Additional, find the 2.5 and 97.5 percentile for each parameter.
The range between these values is the confidence interval.

_______________________________________________________

File Attachments:

-------------------------------------------------------
Date: Fri 16 Jan 2015 04:14:30 PM UTC Name: Screenshot-1.png Size: 161kB
By: tlinnet

<http://gna.org/task/download.php?file_id=23527>

_______________________________________________________

Reply to this item at:

<http://gna.org/task/?7882>

_______________________________________________
Message sent via/by Gna!
http://gna.org/

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-***@gna.org

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Edward d'Auvergne

2015-01-16 16:30:38 UTC

Permalink

Hi Troels,

You should be very careful with your interpretation here. The
curvature of the chi-squared space does not correlate with the
parameter errors! Well, it most cases it doesn't. You will see this
if you map the space for different Monte Carlo simulations. Some
extreme edge cases might help in understanding the problem. Lets say
you have a kex value of 100 with a real error of 1000. In this case,
you could still have a small, perfectly quadratic minimum. But this
minimum will jump all over the place with the simulations. Another
extreme example might be kex of 100 with a real error of 0.00000001.
In this case, the chi-squared space could look similar to the
screenshot you attached to the task ( http://gna.org/task/?7882).
However Monte Carlo simulations may hardly perturb the chi-squared
space. I have observed scenarios similar to these hypothetical cases
with the Lipari and Szabo model-free protein dynamics analysis.

There is one case where the chi-squared space and error space match,
and that is at the limit of the minimum when the chi-squared space
becomes quadratic. This happens when you zoom right into the minimum.
The correlation matrix approach makes this assumption. Monte Carlo
simulations do not. In fact, Monte Carlo simulations are the gold
standard. There is no technique which is better than Monte Carlo
simulations, if you use enough simulations. You can only match it by
deriving exact symbolic error equations.

Therefore you really should investigate how your optimisation space is
perturbed by Monte Carlo simulations to understand the correlation -
or non-correlation - of the chi-squared curvature and the parameter
errors. Try mapping the minimum for the simulations and see if the
distribution of minima matches the chi-squared curvature
(http://gna.org/task/download.php?file_id=23527).

Regards,

Edward

On 16 January 2015 at 17:14, Troels E. Linnet

Post by Troels E. Linnet
<http://gna.org/task/?7882>
Summary: Implement Monte-Carlo simulation, where errors are
generated with width of standard deviation or residuals
Project: relax
Submitted by: tlinnet
Submitted on: Fri 16 Jan 2015 04:14:30 PM UTC
Should Start On: Fri 16 Jan 2015 12:00:00 AM UTC
Should be Finished on: Fri 16 Jan 2015 12:00:00 AM UTC
Category: relax's source code
Priority: 5 - Normal
Status: In Progress
Percent Complete: 0%
Assigned to: tlinnet
Open/Closed: Open
Discussion Lock: Any
Effort: 0.00
_______________________________________________________
This is implemented due to strange results.
A relaxation dispersion on data with 61 spins, and a monte carlo simulation
with 500 steps, showed un-expected low errors.
-------
results.read(file=fname_results, dir=dir_results)
# Number of MC
mc_nr = 500
monte_carlo.setup(number=mc_nr)
monte_carlo.create_data()
monte_carlo.initial_values()
minimise.execute(min_algor='simplex', func_tol=1e-25, max_iter=int(1e7),
constraints=True)
monte_carlo.error_analysis()
--------
The kex was 2111 and with error 16.6.
i_sort dw_sort pA_sort kex_sort chi2_sort
471 4.50000 0.99375 2125.00000 4664.31083
470 4.50000 0.99375 1750.00000 4665.23872
So, even a small change with chi2, should reflect a larger
deviation with kex.
It seems, that change of R2eff values according to their errors, is not
"enough".
According to the regression book of Graphpad
http://www.graphpad.com/faq/file/Prism4RegressionBook.pdf
Page 33, and 104.
Sxy = sqrt(SS/(N-p))
where SS is sum of squares. N - p, is the number of degrees of freedom.
In relax, SS is spin.chi2, and is weighted.
The random scatter to each R2eff point should be drawn from a gaussian
distribution with a mean of Zero and SD equal to Sxy.
Additional, find the 2.5 and 97.5 percentile for each parameter.
The range between these values is the confidence interval.
_______________________________________________________
-------------------------------------------------------
Date: Fri 16 Jan 2015 04:14:30 PM UTC Name: Screenshot-1.png Size: 161kB
By: tlinnet
<http://gna.org/task/download.php?file_id=23527>
_______________________________________________________
<http://gna.org/task/?7882>
_______________________________________________
Message sent via/by Gna!
http://gna.org/
_______________________________________________
relax (http://www.nmr-relax.com)
This is the relax-devel mailing list
To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-***@gna.org

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Troels Emtekær Linnet

2015-01-16 16:48:34 UTC

Permalink

Hi Edward.

At the moment, I am fairly confident that I should investigate the
distribution from which the errors are drawn.

The method in relax draws from a Gauss distribution of the R2eff errors,
but I should try to draw errors from the
overall residual instead.

It is two different methods.

My PI, has earlier has before analysed the data with the aforementioned
method, and got errors in the hundreds.
Errors are 5-10% of the fitted global parameters.

Having 0.5-1 percent error is way to small, and I see this for 4 of my
datasets.

So, something is fishy.

Best
Troels

Post by Edward d'Auvergne
Hi Troels,
You should be very careful with your interpretation here. The
curvature of the chi-squared space does not correlate with the
parameter errors! Well, it most cases it doesn't. You will see this
if you map the space for different Monte Carlo simulations. Some
extreme edge cases might help in understanding the problem. Lets say
you have a kex value of 100 with a real error of 1000. In this case,
you could still have a small, perfectly quadratic minimum. But this
minimum will jump all over the place with the simulations. Another
extreme example might be kex of 100 with a real error of 0.00000001.
In this case, the chi-squared space could look similar to the
screenshot you attached to the task ( http://gna.org/task/?7882).
However Monte Carlo simulations may hardly perturb the chi-squared
space. I have observed scenarios similar to these hypothetical cases
with the Lipari and Szabo model-free protein dynamics analysis.
There is one case where the chi-squared space and error space match,
and that is at the limit of the minimum when the chi-squared space
becomes quadratic. This happens when you zoom right into the minimum.
The correlation matrix approach makes this assumption. Monte Carlo
simulations do not. In fact, Monte Carlo simulations are the gold
standard. There is no technique which is better than Monte Carlo
simulations, if you use enough simulations. You can only match it by
deriving exact symbolic error equations.
Therefore you really should investigate how your optimisation space is
perturbed by Monte Carlo simulations to understand the correlation -
or non-correlation - of the chi-squared curvature and the parameter
errors. Try mapping the minimum for the simulations and see if the
distribution of minima matches the chi-squared curvature
(http://gna.org/task/download.php?file_id=23527).
Regards,
Edward
On 16 January 2015 at 17:14, Troels E. Linnet

Post by Troels E. Linnet
<http://gna.org/task/?7882>
Summary: Implement Monte-Carlo simulation, where errors

are

Post by Troels E. Linnet
generated with width of standard deviation or residuals
Project: relax
Submitted by: tlinnet
Submitted on: Fri 16 Jan 2015 04:14:30 PM UTC
Should Start On: Fri 16 Jan 2015 12:00:00 AM UTC
Should be Finished on: Fri 16 Jan 2015 12:00:00 AM UTC
Category: relax's source code
Priority: 5 - Normal
Status: In Progress
Percent Complete: 0%
Assigned to: tlinnet
Open/Closed: Open
Discussion Lock: Any
Effort: 0.00
_______________________________________________________
This is implemented due to strange results.
A relaxation dispersion on data with 61 spins, and a monte carlo

simulation

Post by Troels E. Linnet
with 500 steps, showed un-expected low errors.
-------
results.read(file=fname_results, dir=dir_results)
# Number of MC
mc_nr = 500
monte_carlo.setup(number=mc_nr)
monte_carlo.create_data()
monte_carlo.initial_values()
minimise.execute(min_algor='simplex', func_tol=1e-25, max_iter=int(1e7),
constraints=True)
monte_carlo.error_analysis()
--------
The kex was 2111 and with error 16.6.
i_sort dw_sort pA_sort kex_sort chi2_sort
471 4.50000 0.99375 2125.00000 4664.31083
470 4.50000 0.99375 1750.00000 4665.23872
So, even a small change with chi2, should reflect a larger
deviation with kex.
It seems, that change of R2eff values according to their errors, is not
"enough".
According to the regression book of Graphpad
http://www.graphpad.com/faq/file/Prism4RegressionBook.pdf
Page 33, and 104.
Sxy = sqrt(SS/(N-p))
where SS is sum of squares. N - p, is the number of degrees of freedom.
In relax, SS is spin.chi2, and is weighted.
The random scatter to each R2eff point should be drawn from a gaussian
distribution with a mean of Zero and SD equal to Sxy.
Additional, find the 2.5 and 97.5 percentile for each parameter.
The range between these values is the confidence interval.
_______________________________________________________
-------------------------------------------------------

161kB

Post by Troels E. Linnet
By: tlinnet
<http://gna.org/task/download.php?file_id=23527>
_______________________________________________________
<http://gna.org/task/?7882>
_______________________________________________
Message sent via/by Gna!
http://gna.org/
_______________________________________________
relax (http://www.nmr-relax.com)
This is the relax-devel mailing list
To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Edward d'Auvergne

2015-01-16 17:09:26 UTC

Permalink

Hi,

Do the R2eff errors look reasonable? Another issue is in clustered
analysis, certain parameters can be over-constrained by being shared
between multiple data sets. This is the biased introduced by an
under-fitted problem. This can artificially decrease the errors.
Anyway, you should plot the Monte Carlo simulations, a bit like I did
in figure 4 of my paper:

d'Auvergne, E. J. and Gooley, P. R. (2006). Model-free model
elimination: A new step in the model-free dynamic analysis of NMR
relaxation data. J. Biomol. NMR, 35(2), 117-135.
(http://dx.doi.org/10.1007/s10858-006-9007-z)

That might indicate if something is wrong - i.e. if optimisation of
certain simulations have failed. However this problem only causes
errors to be bigger than they should be (unless all simulations have
failed). I don't know how Monte Carlo simulations could fail
otherwise. Monte Carlo simulations are the gold standard for error
analysis. All other error analysis techniques are judged based on how
close the approach this gold standard. Saying that the Monte Carlo
simulations technique failed is about equivalent to claiming the Earth
is flat! I challenge you to test the statement on a statistics
professor at your Uni ;) Anyway, if Monte Carlo failed, using
residuals will not save you as the failure point will be present in
both techniques. What could have failed is the model or the input
data. Under-fitting due to too much R2eff data variability in the
spins of the cluster would be my guess. Do you see similarly small
errors in the non-clustered analysis of the same data?

Regards,

Edward

On 16 January 2015 at 17:48, Troels Emtekær Linnet

Post by Troels EmtekÃ¦r Linnet
Hi Edward.
At the moment, I am fairly confident that I should investigate the
distribution from which the errors are drawn.
The method in relax draws from a Gauss distribution of the R2eff errors, but
I should try to draw errors from the
overall residual instead.
It is two different methods.
My PI, has earlier has before analysed the data with the aforementioned
method, and got errors in the hundreds.
Errors are 5-10% of the fitted global parameters.
Having 0.5-1 percent error is way to small, and I see this for 4 of my
datasets.
So, something is fishy.
Best
Troels

Troels Emtekær Linnet

2015-01-16 17:25:31 UTC

Permalink

Hi Edward.

I do not claim that "Monte Carlo simulations" is not the gold standard.

I am merely trying to investigate the method by which one draw the errors.

In the current case for dispersion, one trust the R2eff errors to be the
distribution.
These are individual per spin.

Another distribution could be from how well the clustered fit performed.
And this is what I am looking into.

Best
Troels

Post by Edward d'Auvergne
Hi,
Do the R2eff errors look reasonable? Another issue is in clustered
analysis, certain parameters can be over-constrained by being shared
between multiple data sets. This is the biased introduced by an
under-fitted problem. This can artificially decrease the errors.
Anyway, you should plot the Monte Carlo simulations, a bit like I did
d'Auvergne, E. J. and Gooley, P. R. (2006). Model-free model
elimination: A new step in the model-free dynamic analysis of NMR
relaxation data. J. Biomol. NMR, 35(2), 117-135.
(http://dx.doi.org/10.1007/s10858-006-9007-z)
That might indicate if something is wrong - i.e. if optimisation of
certain simulations have failed. However this problem only causes
errors to be bigger than they should be (unless all simulations have
failed). I don't know how Monte Carlo simulations could fail
otherwise. Monte Carlo simulations are the gold standard for error
analysis. All other error analysis techniques are judged based on how
close the approach this gold standard. Saying that the Monte Carlo
simulations technique failed is about equivalent to claiming the Earth
is flat! I challenge you to test the statement on a statistics
professor at your Uni ;) Anyway, if Monte Carlo failed, using
residuals will not save you as the failure point will be present in
both techniques. What could have failed is the model or the input
data. Under-fitting due to too much R2eff data variability in the
spins of the cluster would be my guess. Do you see similarly small
errors in the non-clustered analysis of the same data?
Regards,
Edward
On 16 January 2015 at 17:48, Troels Emtekær Linnet

but

Post by Troels EmtekÃ¦r Linnet
I should try to draw errors from the
overall residual instead.
It is two different methods.
My PI, has earlier has before analysed the data with the aforementioned
method, and got errors in the hundreds.
Errors are 5-10% of the fitted global parameters.
Having 0.5-1 percent error is way to small, and I see this for 4 of my
datasets.
So, something is fishy.
Best
Troels

Post by Troels E. Linnet
<http://gna.org/task/?7882>
Summary: Implement Monte-Carlo simulation, where

errors