Previous in Forum: ASME Certification - U and R Stamp   Next in Forum: Cryogenic Service
Close
Close
Close
21 comments
Rate Comments: Nested
Commentator

Join Date: Apr 2008
Posts: 96
Good Answers: 1

Goodness of Fit Between Model and Exp. Data

06/08/2010 2:00 AM

Hello folks,

I am just wodering how I can check the goodness of fit btw my model and exp. data.

I studied on some of websites and books, but still vague on it.

if my model is not regression model, how can I check how well my model fit to the data ?

in my understanding, " R^2 = 1 - SSerr/SStot ", this can be only applied to regression model. SStot = sum (yi -ymean)..: if the data do not follow normal distribution,.. i guess SStot would not work well.. i am not sure..

Any comment will be very appreciated !

thanks.

Register to Reply
Interested in this topic? By joining CR4 you can "subscribe" to
this discussion and receive notification when new comments are added.
Guru
Hobbies - Musician - Engineering Fields - Chemical Engineering - New Member Engineering Fields - Control Engineering - New Member Engineering Fields - Instrumentation Engineering - New Member

Join Date: Jan 2007
Location: Moses Lake, WA, USA, Thulcandra - The Silent Planet (C.S. Lewis)
Posts: 4216
Good Answers: 194
#1

Re: goodness of fit

06/08/2010 2:04 AM

Are you using Excel? If so, you can create a trendline of your data. Once you have done this, you can display the R2 statistic.

__________________
"Reason is not automatic. Those who deny it cannot be conquered by it. Do not count on them. Leave them alone." - Ayn Rand
Register to Reply
Guru

Join Date: Mar 2007
Location: City of Light
Posts: 3943
Good Answers: 183
#2

Re: Goodness of Fit Between Model and Exp. Data

06/08/2010 7:55 AM

R² is a global quantitative estimation of differences between a family of values obtained by measurements and a reference. To avoid the negative influence of sign on the errors ( if errors are symmetric result is zero although the errors are present) in the computation differences are squared.

The reference can be the best fit or the model which ever you want.

If not clear ask and I shall give an example.

Register to Reply
Commentator

Join Date: Apr 2008
Posts: 96
Good Answers: 1
#5
In reply to #2

Re: Goodness of Fit Between Model and Exp. Data

06/09/2010 1:45 AM

thanks

if my model is not a linear model (i.e. non-linear model), can I use R^2 to check how good my model works?

Register to Reply
Guru

Join Date: Mar 2007
Location: City of Light
Posts: 3943
Good Answers: 183
#7
In reply to #5

Re: Goodness of Fit Between Model and Exp. Data

06/09/2010 7:14 AM

Why stay with R² when other solutions do exist ? Of course R² is NOT related to a degree of the fit but in your case the model is the fit you want to consider for goodness !

Have a look at the different papers the most interesting one for you is the "Elsevier" which gives a very detailed analysis of the way methods have to be considered.

The simplest approach is the one using the relative error.

The best approach is to compute the uncertainty band on both sides of the model and select the M-values which are very far out of it. Those can be accidental errors which could if the reason is understood be eliminated from the analysis. You can also make a graph of errors and analyse their probabilistic dispersion.

I think the most productive way for you will be to read first the paper it will give you a good insight and then you can make better your choice forget about statisticians your problem is not so complex that you are obliged to ask for such a complicated help.

Register to Reply
Anonymous Poster
#3

Re: Goodness of Fit Between Model and Exp. Data

06/08/2010 8:58 AM

You are right on the aspect that the models do assume the data to be normal.

However if you are sure the data is not normal (you have the statistic to check for the data normality),

there are techniques for normalisation. This is the most preferable method.

However if you have some problem, or some constraints on this, you may go ahead with the usual method, the associated errors will not be usually significant enough.

In fact in a lot of our experiments, with the data trends known skew the estimation errors will be within required value. But then we are in industrial statistics and there just a bit here and there from the optimum are later corrected during fine tuning.

What is that you are trying on? is it curve fitting? in that case what you are trying is the minimising the RMS error. This may work just fine depending on how much out of out of normal you are and what is the accuracy of prediction you are interested in.

Check this out at the first instance, but then you may need a statistician.

Register to Reply
Guru

Join Date: Mar 2007
Location: City of Light
Posts: 3943
Good Answers: 183
#4

Re: Goodness of Fit Between Model and Exp. Data

06/08/2010 2:08 PM

If you google "goodness of fit" you get a series of interesting papers.

I used an estimation method based on the squared error and on the squared sum of model values.

If for an xi value the model predicts the Pi value and the measurement brings the Mi value you can estimate the relative error as: ei^2= (Pi-Mi)^2/(Pi^2). The squares to avoid influence of sign. You can define a global mean relative error as: ε^2 = N^-1*Σ (ei^2).

An other approach is the RMSE= root mean square of errors and the MAE mean of absolute errors. RMSE=[N^-1*Σ(Pi-Mi)^2^]^0.5 MAE=N^-1*Σabs(Pi-Mi)

N being the number of measured values.

Both are sensitive to values out of range which can be accidental. The best is to compute the relative errors and analyse their distribution. If a typical distribution appears then consider them as a probabilistic value and continue the analysis as statistics indicate it.

Register to Reply
Power-User

Join Date: Mar 2007
Posts: 284
Good Answers: 6
#6

Re: Goodness of Fit Between Model and Exp. Data

06/09/2010 2:15 AM

"No fit" can be "qualitative" where your hypothesis can be summarised by, say, a 45 degree line going through the origin while your date tends to be scattered around a minus 60 degree line going through x=0, y=1. Bad fit can also be "qualitative" where your data points seem to be scattered around your hypothesis line but in a random pattern and rather far away from it.

Generally speaking if you have a qualitative misfit - your hypothesis is wrong and if you see a quantitative misfit - your method of measurement is not good enough.

On top of it all there is the formal question of "goodness of fit" which, very roughly speaking, is measured by the calculated probability of obtaing your results by chance under your hypothesis. This can be done by professional statisticians (who make a living out of such activities) or by precisely following directions given in a a good DIY statistics cookbook.

When you have such calculated probabilities you are supposed to use conventional names for probability levels, such as: 5% or less is called "significant", 1% or less is called "highly significant" etc.

There are several almost-non-technical books which explain all this much better than I can. An old, very good, book is titled "How to Lie with Statistics". A newer one begins with an explanation of how a tall statistician can drown in a puddle with an average depth of 1cm.

__________________
Constant change is here to stay!
Register to Reply
Anonymous Poster
#8

Re: Goodness of Fit Between Model and Exp. Data

06/09/2010 10:38 AM

There is an excellent book called Measurement Uncertainty Methods and Applications by Ronald H. Dieck, published by the Instrument Society of America. It approaches measurement uncertainty from the perspective of the experimental data, but there is a section on curve fitting and correlation coefficients. Very readable, i.e., not written by a statistician, with engineering examples.

It is not clear whether you have a model you are fitting to other people's data, or whether you have your own data, and are trying to find a model to match it.

Register to Reply
Guru

Join Date: Dec 2007
Location: California
Posts: 2363
Good Answers: 63
#9

Re: Goodness of Fit Between Model and Exp. Data

06/09/2010 1:40 PM

First thing is to plot the resulting model output for any point against the data itself. In other words plot Ymodel on the y axis against Ydata on the x axis. This should provide a 1:1 result in ideal model. However, you can then take this plot and compare the R2 for this to quantify your errors. You should look at how the data plots. Plot a best fitted linear line. If it is fairly random above and below the 1:1 slope line, the best fitted line will be 1:1 and the R2 represents the data error for the random data set. If the line is not 1:1 you have a systematic error. Plot other lines and check the R2. If you get better fits for log or exponential plots, or data errors get progressively larger, then your error might be based on to parametric value of Xdata,model. So then you try plotting data corerected for the progressive error accumulations due to the X values influence. Bear in mind the data errors might not be normal (if you have enough compiled, you should be able to analyze the data errors for normality), and a more robust analysis based on median values instead of mean values may be required or analysis for removal of extreme outliers (removal of outliers can be a problem, because outliers could show an error in the model rather than some other systematic error, and such processes must be justified and impact the precision of the model accuracy).

Register to Reply
Guru

Join Date: Mar 2007
Location: City of Light
Posts: 3943
Good Answers: 183
#10

Re: Goodness of Fit Between Model and Exp. Data

06/09/2010 6:34 PM

I wanted to suggest a simpler method based on the correlation coefficient. If you compute the correlation coefficient of measurements results and model values the better the fit the nearer the result to value 1.

Register to Reply
Power-User

Join Date: Mar 2007
Posts: 284
Good Answers: 6
#11
In reply to #10

Re: Goodness of Fit Between Model and Exp. Data

06/10/2010 2:58 AM

A correlation coefficient works very well as an indication of linear dependence. It can easily be shown that for some non-linear 'shapes' of dependence a graph shows obvious dependence while the correlation coefficient approaches zero.

The question is not: 'how to do quick and easy calculations?'. It should be: "am I seeing a dependence of one measurement on another?". This can usually be answered by graphing the data and looking at the results. If a suspected dependence is seen and if this dependence has to be formally proved or checked, then some formal statistics can be applied.

It should be remembered that for self-persuasion any device is OK. If others have to be persuaded, then, sadly or not, a generally accepted method should be used.

__________________
Constant change is here to stay!
Register to Reply
Guru

Join Date: Mar 2007
Location: City of Light
Posts: 3943
Good Answers: 183
#12
In reply to #11

Re: Goodness of Fit Between Model and Exp. Data

06/10/2010 4:10 AM

The graph is for presentations OK but for a results validation a linear regression of results has same effect.

Of course it depends who is the one to look at the presentation.

Your idea is not bad on the contrary. The problem is which expression gives the BEST QUANTITATIVE indicator for the results goodness and with the lowest effort: maximal value!

It can go deeper in the analysis of the errors distribution (if normal or not, why some come out of range, aso) but this is not always needed.

Register to Reply
Power-User

Join Date: Mar 2007
Posts: 284
Good Answers: 6
#13
In reply to #12

Re: Goodness of Fit Between Model and Exp. Data

06/10/2010 5:20 AM

There is not much point in back-and-forth bandying of remote specific advice. Especially not when the problem to be solved is not clearly specified. Some problems can (and were) successfully solved by DIY methods, others need an expert. Statistics is a multipurpose tool - necessary for some uses, useless for others. Extracting a wood-screw with a pair of pliers because a pair of pliers is available can be done of course, but it is probably not a very good method for solving a problem.

__________________
Constant change is here to stay!
Register to Reply
Guru

Join Date: Mar 2007
Location: City of Light
Posts: 3943
Good Answers: 183
#14
In reply to #13

Re: Goodness of Fit Between Model and Exp. Data

06/10/2010 7:17 AM

Could you, please, be more explicit ? I unfortunately do not get the point, may be because of my limited knowledge of English.

For me, again because of same reason as above, the problem was clear: a method to obtain a quantitative figure of merit for the relationship between measurement results and model.

Since I am, due to part of my activity, obliged to do similar works I tried to give ,according to the way I understood the question, several opinions. I do not claim to be "statistician" and when I was obliged to discuss statistics with "specialists" I noticed some arrogance and lack of the capacity to inform. I was thus obliged to take a couple of books and study by my own.

Usually the results are used to build a best fit function and this one is accepted as model, in this case it is the other way around, the model exists and the scope is to define the "goodness" of the results with the model. This was the reason for my last proposal to use the correlation function. But according to different papers (and in some respect my own opinion) it is enough to analyse the differences which could be also named errors. Looking at what was expected I first thought that the square root of the mean squared relative "errors" will be a good measure since it gives the width of global relative uncertainty.

If I was wrong I would appreciate an explanation but in extenso, not in 2 words.

What did you understand from the OP ?

Register to Reply
Power-User

Join Date: Mar 2007
Posts: 284
Good Answers: 6
#16
In reply to #14

Re: Goodness of Fit Between Model and Exp. Data

06/10/2010 11:09 AM

Your English is very good and if I have used unclear sentences - I apologise.

Statisticians are professional experts and as such they tend to be arrogant when they are approached for advice by people who know more about the problem in line but less about accepted tecniques for solving it. Surgeons and lawyers do it too...

Linear models are simple and easy to analyse, but! they rarely happen in real life. Therefore 3 general methods are open:

1. Use knowledge about the problem in order to linearise the observed relationship, then use simple and familiar linear statistics to prove or disprove a hypothesis about the relationship.

2. If there is no knowledge that can help to linearise the observed relationship, use more sophisticated tricks which are known to statisticians but are less easy to understand without specific training and experience.

3. Use intuition and forget about statistics because statistics are used to establish something so that others are convinced. Usually, for self-persuasion, there is no need for formal statistics. Later on, if and when the project develops, mobilise a statistician or a good book. The fact that Excel has good statistical routines does not mean that they should be indiscriminately used.

Some models cannot easily be linearised and you are obliged to use sophisticated statistical methods such as Factor Analysis or Clustering Estimation.

Much trouble is caused (mostly to the researcher himself) when he tries to force the observed data to comply with a statistical technique which happens to be familiar(probably because of having analysed long ago a dissimilar problem).

Statistics are the last technique to use. It is not usually a good exploratory method.

If I have not been clear enough, please ask any question off-list and I'll do my best to answer and/or help.

__________________
Constant change is here to stay!
Register to Reply
Guru

Join Date: Mar 2007
Location: City of Light
Posts: 3943
Good Answers: 183
#17
In reply to #16

Re: Goodness of Fit Between Model and Exp. Data

06/10/2010 11:49 AM

I agree if the problem is to obtain a model from measured data. The OP is the opposite, the model does exist and he only wants to estimate how near the measurements are to the model. This can be limited to the analysis of the differences and this has nothing to do with statistics.

It is only an evaluation - quantitative - of the degree the errors are or not small enough with respect to the model. One could of course use the method you suggested which I consider very good but there are also other simple possibilities as I mentioned.

For the 1st step no assumption of normality has to be made it is only important to work with absolute error values which is done using the 2nd power. As GLOBAL figure of merit the RMSE is satisfactory. This can be done either with the error value or it can be done with the relative error value. I made the choice of the second but no doubt when the model value has a wide variation range both should be used in order to see as well the global as the local relative which in the global relative disappears.

Your approach leads in fact to a correlation of the values possible to be done also without graph based on the [Model] and [Measurement] matrix. In your approach the line (45° at same scales) is a graphical correlation.

The distribution type test is compulsory when data are not compared with a model and when the researcher tries to find a bond between two clouds he does not know from the start to have a connection.

The first test is -on my opinion - the test which shows if the distribution assumption is or not correct. If one works with the normal distribution data have to be tested for normality (Kolmogorow or other). The second step is to analyse if all data follow the assumption even if the normality test is positive since there are always "accidental" influences and those have to be recognized as soon as possible and EXPLAINED as good as possible and of course eliminated. Only after those filtering efforts can the analysis be continued. But all of this is on my opinion not so important for the OP case.

I again agree when you say that for exploratory steps statistics are not to be used. I think that statistics should be used ONLY if a good understanding of the physical model is present and only for defining ranges for coefficients but not for defining a theory!

Do not misunderstand me I defend a point of view with the goal to obtain indications about its weaknesses.

Register to Reply Off Topic (Score 5)
Power-User

Join Date: Mar 2007
Posts: 284
Good Answers: 6
#18
In reply to #17

Re: Goodness of Fit Between Model and Exp. Data

06/10/2010 12:26 PM

I suggest:

(a) That you remember that Normal variables are very very rare.

(b) That you read a book called 'The Black Swan' by Nassim Nicholas Taleb, available as a paperback.

__________________
Constant change is here to stay!
Register to Reply Off Topic (Score 5)
Guru

Join Date: Dec 2007
Location: California
Posts: 2363
Good Answers: 63
#19
In reply to #18

Re: Goodness of Fit Between Model and Exp. Data

06/10/2010 12:37 PM

However, normality in the in random (non-systemic) errors is the standard. When comparing a model for fit against the measured data, the absolute relative differences should be normally distributed for an appropriate model hypothesus and small corrections in some coefficients in the model can then be made to improve fit. If there is a non-normal distribution in comparing the model values against the measured values, then the model is inappropriate, and/or either additional variables are necessary or substantial changes to the variables/coefficients in the model.

Register to Reply Off Topic (Score 5)
Power-User

Join Date: Mar 2007
Posts: 284
Good Answers: 6
#21
In reply to #19

Re: Goodness of Fit Between Model and Exp. Data

06/10/2010 2:35 PM

Many professionals agree that statisticians love the normal probability density function because, they think, it manifests itself as an experimental fact, while experimentalists love it because, they think, it has a mathematical basis.

Other professionals are less strict, they only say that the log-normal is more normal than the normal.

__________________
Constant change is here to stay!
Register to Reply Off Topic (Score 5)
Guru

Join Date: Mar 2007
Location: City of Light
Posts: 3943
Good Answers: 183
#20
In reply to #18

Re: Goodness of Fit Between Model and Exp. Data

06/10/2010 2:27 PM

Thank you but I think that we travel on parallel tracks which do not meet. You follow your ideas but do not answer my questions nor make comments to my explanations.

I am aware, even it does not seem to be the case, of the knowledge limitations and of the "power of the unprobable". I am aware of the notion of "risk" and I know something about the catastrophic theory. I am also aware of how little we know and I dared write once what is my basic feeling : "the more you know the more you know that you will never know enough!". the problem of knowledge is more complex since many times even if the knowledge is present its interpretation is too optimistic and the risk underestimated. If the result is not bad this is called courage, if result is bad it is called incompetence.

I never wrote that the variable must be normal. I only indicated that IF the assumption of normality was made it has to be checked because if it is not true the whole analysis based on the assumption has no value. Although "rare" I met already quite often distributions which were "almost" normal which means their values were in a rather narrow band around the theoretical normal distribution.

From an other point of view the Gauss distribution is valid for a theoretically infinite sample since we deal in real life with only limited sample sizes the analysis is already incorrect by using equations valid for an infinite number. The Student distribution is a help but also based on an uncertainty of the mean value.

Any way thank you for the appreciation of my English and for the suggestions.

Chapter closed.

Register to Reply Off Topic (Score 5)
Anonymous Poster
#15

Re: Goodness of Fit Between Model and Exp. Data

06/10/2010 8:42 AM

in my understanding, " R^2 = 1 - SSerr/SStot ", this can be only applied to regression model. SStot = sum (yi -ymean)..: if the data do not follow normal distribution,.. i guess SStot would not work well.. i am not sure..

The hypothesis for any fit is, we know

Y = Σa1iXi +Σa2iXi2 +... +ε.

What we understand from the OP if the data do not follow normal distribution is that ε is not a normally distributed one ie

ε is not N(μ,σ)

This is the condition at which the base where you put the statistic to test (whether you want to calculate the limits or the tests for significance) fails, since all these are basically dependent on the assumption that the errors are randomly distributed.

The reasons for the errors being non-random (not normal) can be many, but the most common one is missing out on a significant variable or missing out the covariance (ie the interdependency of two variables in simple term).

Unless a fairly significant one is missed out, the usual calculations sufffice and that may be one of the reasons why this particular case posted by OP does not arise. After all ANOVA gives one fair enough estimate, and so does the R2 value.

In either cases the data will not look to be normal. But it is necessary to check the errors (one can call it deviations from the estimate) and then check for it's normal ness.

It must also be noted that a lot of data seeming to be normal are not and vice versa. And in those cases it may be necessary to use a proper statistical tool to convert data to normal and proceed.

Register to Reply
Register to Reply 21 comments
Copy to Clipboard

Users who posted comments:

Anonymous Poster (3); dovy (6); Mikerho (1); nick name (8); nzur (1); RCE (2)

Previous in Forum: ASME Certification - U and R Stamp   Next in Forum: Cryogenic Service

Advertisement