[CF-metadata] CF and multi-forecast system ensemble data from Francisco Doblas-Reyes on 2006-10-12 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Francisco Doblas-Reyes <Francisco.Doblas-Reyes>
Date: Thu, 12 Oct 2006 14:27:01 +0100

Hi,

The EU-funded ENSEMBLES project is generating a large set of
seasonal-to-decadal (s2d) multi-forecast system ensemble hindcasts.
Multi-forecast systems include multi-models and perturbed-parameter
ensembles. The ENSEMBLES s2d hindcasts mimic the European multi-model
seasonal ensemble operational forecasts. The data are written in GRIB,
but we intend to improve their dissemination by making the data also
available in NetCDF format.

We found that the current standard names do not allow to describe the
structure of multi-forecast system ensemble forecasts using the CF
convention. Therefore, we would like to propose some additional CF
standard names to avoid ambiguities when coding multi-forecast system
ensemble data:

1) experiment_identifier (STRING). The producing centre is responsible
for assigning unique experiment identifiers for the different
experiments created, and should (ideally) provide documentation of each
experiment. It is possible for common experiment identifiers to be
agreed between different centres, if they are carrying out a common
experiment or creating a multi-model forecast system. There is no a
priori guarantee that identical identifiers from different centres could
refer to scientifically equivalent experiments.
2) originating_centre (STRING). Institution with scientific
responsibility for the forecast system.
3) forecast_system_version_number (INTEGER, units=1). This number should
be used to distinguish between different prediction systems used by the
same institution. For instance, the Met Office will have to choose a
system number for the GloSea model and a different one for the DePreSys
system (both based on the HadCM3 coupled model). It is assigned by the
producing centre and gives scientific details of the models used. A
table online should provide the corresponding information.
4) forecast_method_number (INTEGER, units=1). This variable
distinguishes forecasts made with the same underlying forecasting
system, but where variations have been introduced such that the
different integrations have different properties, most importantly
different climate drift. An example is given by the several members of a
perturbed parameter ensemble forecast, which should share the
"forecast_system_version_number" but have different values of the
"forecast_method_number". As for "forecast_system_version_number", a
table online should provide the corresponding information.
5) ensemble_member_number (INTEGER, units=1). Different integrations
made with the same origin, experiment identifier, method and system
number created using initial-condition perturbations, which form a
homogenous and a priori statistically indistinguishable ensemble.

A single multi-forecast system experiment includes data from multiple
forecast systems, either from a single centre or from several. The
variables 1-4 make a natural tuplet to define a particular homogenous
multi-forecast system ensemble forecast. The ensemble is then spanned by
the ensemble_member_number variable. For instance, a multi-model
ensemble forecast or a perturbed-parameter ensemble is made of a
collection of such tuplets.

Although not actually needed for distribution and archive purposes,
additional variables with the same dimension as the variable
experiment_identifier are also suggested:

1) original_distributor (STRING). Centre with responsibility for
distribution of data, ie the centre who first made the data publicly
available, and to whom queries of data integrity should be sent.
2) production_status (STRING). Operational, research or a user defined
project identifier. The value "research" should be used for general
research at a specific centre, while project_id should be used for
specified international research projects.
3) sst_specification (STRING). It describes the use of the SSTs in the
specific experiment and can take values such as "coupled", "observed",
"predicted", "persisted anomaly" or "persisted absolute".
4) real_time (CHARACTER). It takes the values "true" or "false",
according to whether the forecast was or not made in real-time. It is an
attribute of the individual forecasts.
5) archive_date (INTEGER, units=days from specific date). Describes when
the data was archived or published. The aim is to provide an approximate
timestamp, to easily distinguish between recent experiments and much
older ones. Also, in the case that data need to be corrected in a
globally distributed data system, the archive_date could be used to
distinguish between the older, original data and the newer, corrected
data. This is an attribute of the individual forecast.

Some relevant issues for the encoding of multi-forecast system ensembles
data are as follows:

- We use the variables "forecast_period" and "forecast_reference_time"
as independent time variables employed to define the two time axes of a
forecast dataset with several start dates, ie, both "forecast_period"
and "forecast_reference_time" are multivalued. We believe that
"forecast_period" cannot have time units referenced to a specific date
as "forecast_reference_time" does. This is to prevent having in the file
forecasts with the same verifying date but produced from a different
start date (and, hence, intrinsically different). An alternative would
consist in introducing an index dimension and make two one-dimensional
auxiliary time coordinate variables with this dimension, as suggested by
Jonathan Gregory in the thread "file with both run time and forecast
(valid) time coordinates". Any thoughts about this?

- It has to be mentioned that although "realization" is an existing
standard name to handle ensembles, it can be used to identify either a
forecast_system_version_number (a member of a multi-model ensemble from
the same institution), a forecast_method_number (a member of a
perturbed-parameter ensemble) or an ensemble_member_number (a member of
an initial-condition ensemble). This is problematic as a multi-forecast
system ensemble dataset might have to use those three variables
dimensioned independently. Therefore, the use of these three variables
is suggested instead to distinguish the elements of an experiment
carried out with a multi-forecast system.

The proposed names take account of established practice at operational
centres and usual practice in the research community of climate
variability at different time scales. These names are part of a more
general proposal to unambiguously define the appropriate metadata for
multi-forecast system ensembles, which is based upon a more general
proposal under discussion by WCRP. The proposal is available from:
http://www.ecmwf.int/research/EU_projects/ENSEMBLES/data/index.html

The names and data structure suggested in this message are likely to be
relevant for other operational multi-forecast system ensemble forecast
activities such as EUROSIP or TIGGE.

Apologies for the long message.
Best regards,
Paco

-- 
________________________________________
Francisco J. Doblas-Reyes
European Centre for Medium-Range
Weather Forecasting (ECMWF)
Shinfield Park, RG2 9AX
Reading, UK
Tel: +44 (0)118 9499 655
Fax: +44 (0)118 9869 450
f.doblas-reyes at ecmwf.int
_______________________________________

Received on Thu Oct 12 2006 - 07:27:01 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST