Error propagation: Difference between revisions

no edit summary
(Created page with 'Proper reporting of experimental measurements requires the calculation of error bars or "confidence intervals". The appropriate and satisfactory calibration of data and analysis …')
 
No edit summary
Line 1: Line 1:
Proper reporting of experimental measurements requires the calculation of error bars or "confidence intervals". The appropriate and satisfactory calibration of data and analysis of errors is essential to be able to judge the relevance of observed trends. Below, a brief definition of the main concepts and a discussion of generic ways to obtain error estimates is provided.  
Proper reporting of experimental measurements requires the calculation of error bars or "confidence intervals". The appropriate and satisfactory calibration of data and analysis of errors is essential to be able to judge the relevance of observed trends. Below, a brief definition of the main concepts and a discussion of generic ways to obtain error estimates is provided.  
<ref>[http://www.nrbook.com/ W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in FORTRAN (Cambridge University Press, 1992), 2nd ed.]</ref>
<ref>[http://www.nrbook.com/ W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in FORTRAN (Cambridge University Press, 1992), 2nd ed.]</ref>
<ref>P. Bevington and D. Robinson, Data Reduction and Error Analysis for the Physical Sciences (McGraw-Hill, UK, 2003), 3rd ed.</ref>  
<ref>P. Bevington and D. Robinson, Data Reduction and Error Analysis for the Physical Sciences (McGraw-Hill, UK, 2003), 3rd ed. ISBN 978-0072472271</ref>  
Of course, any particular measuring device generally requires specific techniques.  
Of course, any particular measuring device generally requires specific techniques.  


== The measurement process==
== The measurement process==


The measuring device performs measurements on a physical system ''P''.  
The measuring device performs [[Data analysis techniques|measurements]] on a physical system ''P''.  
As a result, it produces estimates of a set of physical parameters ''{p}''.  
As a result, it produces estimates of a set of physical parameters ''{p}''.  
One may think of ''p'' as loose numbers (e.g., a confinement time), data along a spatial chord at a single time (e.g., a Thomson scattering profile), data at a point in space with time resolution (e.g., magnetic field fluctuations from a Mirnov coil), or data having both time and space resolution (e.g., tomographic data from Soft X-Ray arrays).  
One may think of ''p'' as loose numbers (e.g., a confinement time), data along a spatial chord at a single time (e.g., a Thomson scattering profile), data at a point in space with time resolution (e.g., magnetic field fluctuations from a Mirnov coil), or data having both time and space resolution (e.g., tomographic data from Soft X-Ray arrays).  
Line 31: Line 31:
In the linear case of Eq. (1), and even in slightly more complex situations, standard error propagation techniques can be used to compute the error in ''p'' from the error in ''s''.  
In the linear case of Eq. (1), and even in slightly more complex situations, standard error propagation techniques can be used to compute the error in ''p'' from the error in ''s''.  
Standard error propagation proceeds as follows:
Standard error propagation proceeds as follows:
:<math>z = f(x, y, ...) </math>
:<math>z = f(x, y, ...)\,</math>
:<math>(\Delta z)^2 = \left ( \frac{\partial f}{\partial x}\right )^2 \Delta x^2 + \left ( \frac{\partial f}{\partial y}\right )^2 \Delta y^2 + ... </math>
:<math>(\Delta z)^2 = \left ( \frac{\partial f}{\partial x}\right )^2 \Delta x^2 + \left ( \frac{\partial f}{\partial y}\right )^2 \Delta y^2 + ... </math>
This formula holds exclusively for a Gaussian (normal) distribution of errors (assuming the errors are small and that the independent variables ''x'', ''y'', ... are indeed independent).  
This formula holds exclusively for a Gaussian (normal) distribution of errors (assuming the errors are small and that the independent variables ''x'', ''y'', ... are indeed independent).  
Line 39: Line 39:
An important topic, and cause of many errors in the calculation of error propagation (and parameter fitting) is the issue of collinearity (linear dependencies between elements of ''s'' and/or ''p'').  
An important topic, and cause of many errors in the calculation of error propagation (and parameter fitting) is the issue of collinearity (linear dependencies between elements of ''s'' and/or ''p'').  
The presence of collinearity may affect error levels enormously.  
The presence of collinearity may affect error levels enormously.  
A quick check of possible problems in this sense can be made using the Monte Carlo approach (see below).  
A quick check of possible problems in this sense can be made using the [[:Wikipedia:Monte Carlo method|Monte Carlo approach]] (see below).  
Several techniques are available to handle collinearity, such as Principal Component Analysis (basically, by orthogonalisation of the correlation matrix of ''s'').  
Several techniques are available to handle collinearity, such as Principal Component Analysis (basically, by orthogonalisation of the correlation matrix of ''s'').  
The Monte Carlo approach also provides a simple method for error estimation for the much more difficult problem of a non-linear mapping ''M<sub>p</sub>''.  
The Monte Carlo approach also provides a simple method for error estimation for the much more difficult problem of a non-linear mapping ''M<sub>p</sub>''.  
Line 45: Line 45:
To compute the error bar of ''p'', simulated measurements ''s'' are varied randomly within their (known) error bars, using the (known) error distribution, and the standard deviation of the resulting ''p'' is determined, the latter being equal to the error estimate.  
To compute the error bar of ''p'', simulated measurements ''s'' are varied randomly within their (known) error bars, using the (known) error distribution, and the standard deviation of the resulting ''p'' is determined, the latter being equal to the error estimate.  
This technique also provides a quick method to check for possible problems such as ill-conditioning, cited above.  
This technique also provides a quick method to check for possible problems such as ill-conditioning, cited above.  
When the model relating ''s'' and ''p'' is known, as well as the error distributions (and the latter may either be Gaussian or not), a more systematic approach to error propagation is provided by a technique known as the maximum likelihood method.  
When the model relating ''s'' and ''p'' is known, as well as the error distributions (and the latter may either be Gaussian or not), a more systematic approach to error propagation is provided by a technique known as the [[:Wikipedia:Maximum likelihood|maximum likelihood method]].  
<ref>Particle Data Group, Eur. Phys. J. C 3, 1 (1998)</ref>
<ref>Particle Data Group, Eur. Phys. J. C 3, 1 (1998)</ref>
This technique is simply the generalisation of standard error propagation to general error distributions (i.e. not limited to Gaussians).  
This technique is simply the generalisation of standard error propagation to general error distributions (i.e. not limited to Gaussians).  
Line 69: Line 69:
A steady-state discharge can be used for this purpose.  
A steady-state discharge can be used for this purpose.  
The fluctuation amplitude of the signal ''s'' will then be equal to its error bar.  
The fluctuation amplitude of the signal ''s'' will then be equal to its error bar.  
We note, however, that this poor mans approach to error estimation will always provide an upper limit of the error bars, since the actual (physical) variability of the signal is added to the random error, whereas it provides no indication of the systematic error.  
We note, however, that this poor man's approach to error estimation will always provide an upper limit of the error bars, since the actual (physical) variability of the signal is added to the random error, whereas it provides no indication of the systematic error.  


== Test of statistical validity of the model ==
== Test of statistical validity of the model ==


If a model is characterised by a number of free (fit) parameters ''&alpha;<sub>i</sub>, i = 1, ..., n'' and used to predict (or fit) some measurements, then, once error estimates are available, it can (and should) be subjected to a &chi;<sup>2</sup>-test.  
If a model is characterised by a number of free (fit) parameters ''&alpha;<sub>i</sub>, i = 1, ..., n'' and used to predict (or fit) some measurements, then, once error estimates are available, it can (and should) be subjected to a [[:Wikipedia:Chi-square test|&chi;<sup>2</sup>-test]].  
The value of &chi;<sup>2</sup> obtained should be close to the number of free parameters; if it isn't, the number of free parameters n should be modified until it is.  
The value of &chi;<sup>2</sup> obtained should be close to the number of free parameters; if it isn't, the number of free parameters ''n'' should be modified until it is.  


== Fluctuations and noise ==
== Fluctuations and noise ==
Line 81: Line 81:
The simplest case is when the physically interesting phenomenon is slowly varying in time.  
The simplest case is when the physically interesting phenomenon is slowly varying in time.  
Random noise is usually characterised by a high frequency, so that a filter in frequency space can then separate signal and noise neatly.  
Random noise is usually characterised by a high frequency, so that a filter in frequency space can then separate signal and noise neatly.  
<ref>D. Newland, An Introduction to Random Vibrations, Spectral and Wavelet Analysis (Dover, New York, 1993)</ref>
<ref>D. Newland, An Introduction to Random Vibrations, Spectral and Wavelet Analysis (Dover, New York, 1993) ISBN 0486442748</ref>
However, when the physically interesting information is fluctuating, this signal-noise separation by frequency is not feasible, and much care is needed when analysing data.  
However, when the physically interesting information is fluctuating, this signal-noise separation by frequency is not feasible, and much care is needed when analysing data.  
The application of a set of techniques is required to understand such signals (cross correlation, conditional averaging, spectral analysis, bi-spectral analysis,
The application of a set of techniques is required to understand such signals (cross correlation, conditional averaging, spectral analysis, bi-spectral analysis,
<ref>J. van den Berg, ed., Wavelets in Physics (Cambridge University Press, 1999)</ref>
<ref>J. van den Berg, ed., Wavelets in Physics (Cambridge University Press, 1999) ISBN 978-0521593113</ref>
determination of fractal dimension, mutual information, reconstruction of chaotic attractor,
determination of fractal dimension, mutual information, reconstruction of chaotic attractor,
<ref>H. Abarbanel, R. Brown, J. Sidorowich, and L. S. Tsimring, Rev. Mod. Phys. 65, 1331 (1993)</ref> ...).  
<ref>H. Abarbanel, R. Brown, J. Sidorowich, and L. S. Tsimring, Rev. Mod. Phys. 65, 1331 (1993)</ref> ...).  
Line 113: Line 113:
== Integrated data analysis ==
== Integrated data analysis ==


Often, various different diagnostics provide information on the same physical parameter (e.g. the electron temperature ''T<sub>e</sub>'' is measured by Thomson Scattering, ECE, and the HIBP, and indirectly also by SXR, although mixed with information on the electron density ne and Zeff. The electron density is measured directly by Thomson Scattering, the HIBP, reflectometry, and interferometry, and indirectly by SXR).  
Often, various different diagnostics provide information on the same physical parameter (e.g., in a typical fusion plasma experiment, the electron temperature ''T<sub>e</sub>'' is possibly measured by Thomson Scattering, ECE, and a HIBP, and indirectly also by SXR, although mixed with information on the electron density ''n<sub>e</sub>'' and ''Z<sub>eff</sub>''. The electron density is measured directly by Thomson Scattering, the HIBP, reflectometry, and interferometry, and indirectly by SXR).  
Part of this information is local, and part is line-integrated. Instead of cross-checking these diagnostics for one or a few discharges, one could decide to make an integrated analysis of data.  
Part of this information is local, and part is line-integrated. Instead of cross-checking these diagnostics for one or a few discharges, one could decide to make an integrated analysis of data.  
This means using all information available to make the best possible reconstruction of, e.g., the electron density and temperature that is compatible with all diagnostics simultaneously.  
This means using all information available to make the best possible reconstruction of, e.g., the electron density and temperature that is compatible with all diagnostics simultaneously.