Wednesday, June 27, 2012

Which score to assess probabilistic forecasts?


Since probabilistic forecasts have been produced, a question has directly followed: how to assess them?
In the deterministic case this is relatively simple: least square. Yet Bettina Schaefli ask herself if “Nash value has value” and Hoshin Gupta showed an interesting decomposition that makes us think that it is not always so simple.
However, In the deterministic case you have a residual that is the difference between two points (forecasted and observed value), then your score is some metric of this residual. For most practical uses, squared errors, since Gauss, are an accepted solution.
When the forecast is in probability, the problem is different: you do not have two points any more, but one forecast distribution and one extraction,
There are some instruments, like the QQ plot or others, for a synthetic assessment of the probabilistic forecast. However a plot, even if more informative than a score, does not give a univocal indication. Which is to be used, for example, for parameters optimization. Clearly, we need a score to get a number, but which?

One score widely used, although questionable, is the Continuous Ranked Probability Score, or just CRPS. CRPS has interesting properties, as being a generalization of the Brier score, thus it can be decomposed in (un)reliability - resolution + uncertainty, and, for the deterministic case, to be equal to the mean absolute error (MEA). Thus CRPS score penalizes correctly more if the extraction fall outside the "mass" of the pdf.

Is CRPS then a good solution and the problem is solved?
My opinion: definitely not. There are two main drawbacks connected to this score.
One, apparently more philosophical, is its non-locality. This is pointed out in a brilliant paper by Steven Weijs, my ex-colleague now working at the Polytechnic of Losanne. He points out the contradiction of inferring information from something that you did not observe. In other words, the entire pdf, and not just its value at the extraction point, participates to the score.

The second reason is more practical. In the area where the predicted probability is zero, CRPS penalty increases linearly with the "error". But if we use a probabilistic forecasts, we want to have information on less probable events, to which we are particularly sensitive. Thus those errors should be penalized more than just linearly.

What to use, then?
The likelihood principle has not received the deserved attention yet, at least within the statistical hydrology community, but it is enlightening. It has been suggested to my again by Steven.
The likelihood principle states that:
within the context of the specified model, the likelihood function L(parameter) from data D contains all the information about the parameter that is contained in D
The point is that this is a principle. thus it is so self-evident, “vanzelfsprekend”, as the Dutch would say. And if you think about it, this sounds logical. IF you want to look at the probability of a parameter, you just have to look at the likelihood function. Simple. Isn't it?
It is local and it rewards correctly good forecasts, being proportional to the their expectation.

Some difficulty to the practical use of the likelihood principle ca be some user's difficulties with the rigidness of a scores that penalize so much the out-layers.
But the choice of a score, in a correct scientific procedure, should be independent on the data. We cannot use the scores that fits our intuition.
The right way to proceed is to use a score to evaluate data, and not data to evaluate a score.

Normal standard distribution, CRPS and (log)likelihood scores. The y axis is not expressed because the relative difference, not the absolute value, matters.

Tuesday, June 26, 2012

Science and policy: participated research or “war of reports”?


Sandra Junier, one of my colleague here at TU Delft, introduced during our habitual lunch meeting, the issue of the role of expertise in policies, which is, specific to water management, the main topic of her PhD research.
She used a case to illustrate the relations between what i would call "knowledge" and "power": the events before the European Water Framework Directive (WFD) had to be implemented in the Netherlands.

The core request of WFD is the achievement of “good status” for all the water bodies. “Good" is a relative concept, depending on the level of naturalness of the water system. The competence to define this was left to the single states. Every country had to classify its own water systems into the categories of “natural, heavily modified”, or “artificial”.
In 2003 a Dutch research institute published a report whose result stated that the complete WFD implementation would imply to put aside 2/3 of the Dutch agriculture.
Agriculture is a very important sector in Holland, having an important role in the Dutch economy. Even if you would not say it from the taste of its tomatoes!
The report heavily influenced the debate, coming out just before the discussion, and leading to concern and resistances within the agricultural sector.
The national government instructed the authority to be "pragmatic", meaning taking into account existing economic interests.
The result was that the WFD implementation in the Netherlands was very in protection of the agriculture:
Most of the water bodies have been classified as “artificial” or “heavily modified” (98% in total). Even considering that the Dutch landscape is strongly urbanized and modified, this datum appears in its largeness when compared to German value (52%) and even more, the French one, which was around the ten percent.
In this case the policy making process was influenced mostly by only one interest: the debate was in fact dominated by that information provided by that report.

Apart from the specific case. It is interesting to analyze the different models of interaction between policy makers and experts.
Sandra summarizes them in three models, plus one.
1)science provide the knowledge, politics decide
I call this "the ideal case", but it is more a wish than reality: a situation in which science holds the truth, enlightening the politics, intended here in general as representatives of some organized interest. Unfortunately reality shows more often the second and third models:
2) Politics use the science that better fits their needs
In this model, in which politics do "knowledge shopping° according to their needs, recognized that many many research centers can get to different conclusions. Reality is complex and can be framed in different ways, different framework can give completely different policy indications.
Science is not that monolithic as it appears to be. Divergences exists, and the best strategy for an organized group of interest is sponsoring the “stream” that better fits their needs.
A third model is
3)Every stakeholder use science strategically
Organized interests have their own research centers. Every group trusts its own one, leading the political discussion to a so-called “dialogue of the deaf”, or using another image, “war of reports”.

The fourth model, in which Sandra seems to believe, is the “coproduction of science”. In which the knowledge is produced together and eventually accepted by (possibly) all the stakeholders.
This seems a good solution ahead of the others. But, there are, in my opinion, two issues.
Of course participated sounds better, suggesting a more democratic and more inclusive process, or using water management terms, more integrated. However, scientific research is a specialistic work, often involving few experts, thus participation must be structured and organized. Non-expert can add no information to a technical discussion, and their presence risks of not being of any value. In my opinion, a more effective model is the involvement of stakeholder's technical representative, experts that stand up for their interests in the scientific debate. The idea is, someway, that organized interests, having their own research centers, joining for a participated science production.
Another question about the coproduction of science model, in my opinion, is if this model is only possible in some political contexts? The Dutch debate and society are marked by a strong harmony and a tendency to the compromise. It requires a level of trust among the actors that can be already wear out. I remember in fact that at Enel, where I worked some years ago, a colleague referred to stakeholders involvement as "having a viper in one’s bosom”.

Transparency on the funding sources of a research institutes can add information to assess its credibility. And to add information to the debate.
In conclusion, one of the propositions that will be part of my final PhD thesis: "the capital of a researcher (or, in this case, a research institute) is its authoritativeness. As a capital, authoritativeness is both a positive value, a form of power, and it is self-reproducing: it can be used to gain even more of it. It is however a “soft power”, it is in fact based on the researcher capacity to convince.

Saturday, May 26, 2012

An innovative approach to deal with heteroscedastic processes

It has been published, on Water Resources Research, my first paper:
Dynamic modelling of predictive uncertainty by regression on absolute errors


This work is the follow-up of my master thesis, in which I developed, under the supervision of Francesca Pianosi, a model for the predictive uncertainty of the inflow to the Verbano Lake.
In fact, the inflow forecast is very often an heteroscedastic process, which means that the variance changes on time. The key idea is considering the variance itself as an hydrological process. Then you can build up a dynamic model of the variance.
The innovation, worth a publication, has been the use of absolute error to estimate the variance, instead of squared error, as typically done.
I will present this research at the 3rd Workshop on Statistical Methods for Hydrology and Water Resources Management, during the Non-stationarity and reservoir management session, on the 1st of October, in Tunisia.

That day I will also turn 30!

Wednesday, May 16, 2012

Hydroinformatic conference, 14th-18th July

From 14th to 18th July, the Hydroinformatic conference will be held in Hamburg, Germany.

I will be there with two presentation: the first is an application of MPC to Salto Grande, a reservoir for energy production located in Uruguay. This test case use real weather forecasts for anticipatory water management. If we get the results on time, i will also show how to use ensemble forecasts for control, in the Tree-Based MPC algorithm. The second presentation will be an application of Stochastic Approximation Algorithm (SAA). SAA is a promising technique to solve stochastic optimization in a short time, which is a requirement of real time control application, as MPC is.
The conference will be a good occasion to meet people working on arguments close to the theme of this blog, which is : "how to use information for decisions in the water sector". I will be there together with Dirk Schwanenberg, my supervisor at Deltares, P.J. van Overloop, daily supervisor at TU Delft, and many others, among which: Jan Talsma from Deltares, and Francesca Pianosi, from Politecnico di Milano. The list of colleagues and friends joining the conference is probably to be extended...