Wednesday, June 27, 2012

Which score to assess probabilistic forecasts?


Since probabilistic forecasts have been produced, a question has directly followed: how to assess them?
In the deterministic case this is relatively simple: least square. Yet Bettina Schaefli ask herself if “Nash value has value” and Hoshin Gupta showed an interesting decomposition that makes us think that it is not always so simple.
However, In the deterministic case you have a residual that is the difference between two points (forecasted and observed value), then your score is some metric of this residual. For most practical uses, squared errors, since Gauss, are an accepted solution.
When the forecast is in probability, the problem is different: you do not have two points any more, but one forecast distribution and one extraction,
There are some instruments, like the QQ plot or others, for a synthetic assessment of the probabilistic forecast. However a plot, even if more informative than a score, does not give a univocal indication. Which is to be used, for example, for parameters optimization. Clearly, we need a score to get a number, but which?

One score widely used, although questionable, is the Continuous Ranked Probability Score, or just CRPS. CRPS has interesting properties, as being a generalization of the Brier score, thus it can be decomposed in (un)reliability - resolution + uncertainty, and, for the deterministic case, to be equal to the mean absolute error (MEA). Thus CRPS score penalizes correctly more if the extraction fall outside the "mass" of the pdf.

Is CRPS then a good solution and the problem is solved?
My opinion: definitely not. There are two main drawbacks connected to this score.
One, apparently more philosophical, is its non-locality. This is pointed out in a brilliant paper by Steven Weijs, my ex-colleague now working at the Polytechnic of Losanne. He points out the contradiction of inferring information from something that you did not observe. In other words, the entire pdf, and not just its value at the extraction point, participates to the score.

The second reason is more practical. In the area where the predicted probability is zero, CRPS penalty increases linearly with the "error". But if we use a probabilistic forecasts, we want to have information on less probable events, to which we are particularly sensitive. Thus those errors should be penalized more than just linearly.

What to use, then?
The likelihood principle has not received the deserved attention yet, at least within the statistical hydrology community, but it is enlightening. It has been suggested to my again by Steven.
The likelihood principle states that:
within the context of the specified model, the likelihood function L(parameter) from data D contains all the information about the parameter that is contained in D
The point is that this is a principle. thus it is so self-evident, “vanzelfsprekend”, as the Dutch would say. And if you think about it, this sounds logical. IF you want to look at the probability of a parameter, you just have to look at the likelihood function. Simple. Isn't it?
It is local and it rewards correctly good forecasts, being proportional to the their expectation.

Some difficulty to the practical use of the likelihood principle ca be some user's difficulties with the rigidness of a scores that penalize so much the out-layers.
But the choice of a score, in a correct scientific procedure, should be independent on the data. We cannot use the scores that fits our intuition.
The right way to proceed is to use a score to evaluate data, and not data to evaluate a score.

Normal standard distribution, CRPS and (log)likelihood scores. The y axis is not expressed because the relative difference, not the absolute value, matters.

No comments:

Post a Comment