The best way to test a model is to see how well it forecasts, i.e. how well it predicts as yet unobserved data — but sometimes that’s hard. Sometimes the forecasts are very low in quality (skill is the technical phrase). If the realization is dominated by random noise, say my model explains just 10% of the variance, not 80% or 90%, then my verification of the forecasting accuracy would be similarly lacking in precision.
What can be done? Well one thing is look for consistency in the parameters estimated from independent data sets. This can be done by a linear regression of those from set A onto those from set B (whatever A and B are). And for this regression we have a theory, a prior,a null hypothesis, that the parameters should be the same.
So now we have a well defined, and statistically tractable, test. Are the parameter sets the same?
But there’s a problem. The problem is that this is not the system that ordinary linear regression was designed to solve. This is what is called the “errors in variables” problem because the parameter estimates are uncertain — they have errors in them — and the solution is…
[some Googling, some Wikipedia’ing, finding papers, finding the institutions where the authors have home pages which contain PDFs of their work because I’ll be damned if I’ll pay $27 to Springer-Verlag or Blackwell to download a two page PDF file, reading the papers, checking the references, more Google, more Wikipedia]
Deming Regression — proposed by W.Edwards Deming in 1943 and such a good idea that it was named after him.
And that’s what I learned today. To me, a totally new piece of information and a new tool I can use to make my job better.
Email me when Graham Giller publishes or recommends stories
