Validation protocol

Validation protocol

    Statistics - For a better understanding

The page describes the validation protocol adopted to compare a dataset against a reference. In our case, this protocol concerns the comparison of satellite-based irradiation values against high quality ground station measurements, and is in line with the ISO norm and the WMO recommendations (References: ISO Guide 1995, WMO 1981, 2008).

We admit here that we handle high quality ground station measurements. To qualify ground measurements, it is important to apply a quality check procedure. Don't hesitate to take a look to the quality check procedure adopted by MINES ParisTech and Transvalor, developed within the framework of the European FP7 ENDORSE project.

NB concerning the comparison between Direct Normal Irradiance component measured by a ground station and satellite estimations: Measurements with a pyrheliometer (Direct Normal Irradiance component): the pyheliometers have a larger half opening angle than the satellite-based estimations (HelioClim-3, MACC-RAD...) which obeys by construction to the definition given by the community of the radiative transfer models where the sun is more punctual. This has a very limited influence in clear sky conditions, but in overcast conditions, the contribution of the circumsolar  irradiance to the pyrheliometer measurements may exceed 50% due to specific effects of clouds, especially that of cirrus clouds. No correction is presently brought to the HelioClim-3 or MACC-RAD outputs.


NB: the third column corresponds to the statistical results for the comparison of HelioClim-3 version 5 versus the measurements of the BSRN station of Carpentras (44.083°, 5.059°), France ("Quality assessment results of GHI values for Carpentras - 15 min irradiation values"

  Formula Carpentras
Observation at instant k xk 15 min GRounD station measurements: GRDk
Estimation (model) at k yk 15 min HelioClim-3 version 5 data: HC3v5k
Number of samples (Number of coincident values (xk, yk)) N 145349
Mean observed value 105.7 Wh/m²
Standard deviation of the observation -
Deviation at k -
Bias (mean deviation) 1.2 Wh/m²
Relative bias in percent (*100): 1.2%
Root Mean Square Error 15.9 Wh/m²
Relative RMSE in percent (*100): 15.1%
Standard deviation of δk -
Relative standard deviation -
Relative between b, RMSE and s -
Covariance of x and y  
Correlation coefficient 0.971

Top of page

For a better understanding of the indices...

In most of the publications, you will always find the results for a few indices to summarize the discrepancy between two datasets.

We chose the following indices:

  • The number of coincident values (NDATA in the graph below)
  • The mean observed value (MREF in the graph below)
  • The bias and the relative bias in percent (MBE in the graph below)
  • The RMSE and the relative RMSE in percent
  • The correlation coefficient

The graph below is a 2-D histogram, which represents the 10 min HelioClim-3 data versus the corresponding 10 min data from the Carpentras BSRN station after Quality Check for the almost 6 years of data available between 2004 and 2009.

Interpretation of the graph

The relative bias in percent is 1.7, which informs on a slight global overestimation of HelioClim-3 version 5 compared to the in-situ data over the 6 years of available data. The value corresponds to the distance between the black dashed and the red dashed lines (small double arrow). The slight overestimation is due to the orange cloud of points corresponding to low irradiation values, i.e. mainly the winter period. Ideal value is zero.

The standard deviation (STDE on this graph) gives an idea on the "spreading" of the cloud of points (large double arrow). However, most of the time, the RMSE is given instead. As RMSE includes both the bias and the standard deviation and when the bias is low, RMSE is close or slightly above the standard deviation. Ideal value is zero.

The correlation coefficient is an important index which cannot be easily represented on the graph. The closer to the ideal value of 1, the more "in phase" the two datasets ; In other words, the observation (in-situ data) and the estimation (HelioClim3-v5) should capture the weather transitions at the same instant.

The monthly relative bias

The above relative bias is computed over the whole period of data available. 1.7% is very low, and many users are expecting similar mean deviation values when retrieving for instance monthly. However, this is not the case. The relative bias computed over the whole period is hiding trends of errors which can occur within each season for instance. The following graph depicts the relative bias computed for each month of the year over the whole period of data.

One may see that the relative bias is positive and close to 4% in Winter and Autumn, and decreases in Spring and Summer, getting slightly negative in July. Please note that the relative bias may be greater in winter because it results from a division by a smaller mean observed value compared to summer.

Please also note that a calibration of the HelioClim-3 version 5 data with these measurements lead to a relative bias over the whole period of 0.4%, and decrease the relative bias per month by a factor of approx. 2!

Top of page