⇐ ⇒

[CF-metadata] CF and multi-forecast system ensemble data

From: Francisco Doblas-Reyes <Francisco.Doblas-Reyes>
Date: Wed, 18 Oct 2006 16:47:36 +0100

Dear Karl,

The information you mention is very useful.

We are familiar with the importance of writing model output according to
strict standards. ECMWF produces all its operational output in GRIB,
according to WMO recommendations and publicly available extensions:

http://www.ecmwf.int/publications/manuals/libraries/gribex/localGRIBUsage.html

Our operational activities, in particular the daily dissemination of
data to the different member states, imply that the metadata need to
have a unique and clear interpretation.

These standards are not yet available (to my knowledge) in CF for
multi-forecast system ensemble predictions, so that NetCDF can not be
used for any dissemination of operational products.

As I see it, there are two main issues here:

1) Setting standards to be able to produce CF-compliant multi-forecast
system ensemble prediction NetCDF files. That's what we proposed in the
first part of our message. CMOR could be used as a template, but we
still need to find a solution to include ensemble forecasts from several
systems in the same NetCDF file.

2) Finding a way to disseminate the data from these files (or from those
in native format) with OPeNDAP retaining the metadata information. As
the data to be served with the files mentioned above will have more than
4 dimensions, GDS won't be an option.

As you suggest, an option consists in using a translator similar to the
one used in the ECMWF public data server (which, unfortunately, is not
CF compliant). However, this option requires a previous solution to 1)
and does not solve the problem in 2) if OPeNDAP is chosen.

I'd like to point out that all this work does not only relate to the
work carried out at ECMWF or even in Europe, but to the whole
operational forecasting community at any time scale, especially since
the increasing importance of multi-model systems.

Your comments on these issues will be most welcome.
Best regards,
Paco


Karl Taylor wrote:
> Hi Francisco,
>
> Steve is quite correct that various servers can supply data stored in
> various formats via OPeNDAP. It is also true that several plotting
> packages can obtain such data and produce plots and enable some types of
> analysis.
>
> The value of storing data in CF-compliant files *and* adhering to CF
> "best practices", which encourage inclusion of certain types of useful
> metadata (e.g., cell bounds or areas, and various other useful
> "attributes") follows from the following:
>
> When analysts want to perform certain types of precise calculations and
> try out many different variants of some complex (or not so complex)
> processing procedure (e.g., various types of EOF analysis, different
> "masking" options), they usually want to obtain the data and store it
> locally with all it's metadata intact, as long as all the model output
> is written according to strict standards. A simple transfer of files all
> written in similar structure and format makes this easy. The IPCC
> experience taught us this, and analysts have generally reported that it
> has been easy for them to analyze results across models.
>
> An alternative is to store data in "native" formats, but write a
> "translator" which can then be invoked before delivering the data to the
> user. The ECMWF has adopted this method by which they can translate
> their ERA40 (reanalysis) data to users when it is requested, delivering
> CF-compliant netCDF files in place of the native grib files.
>
> One set of "best practices" are specified in the following document:
> http://www-pcmdi.llnl.gov/ipcc/IPCC_output_requirements.htm
> A code is available (CMOR), which makes it easier to meet these
> requirements, and also provides error messages if one attempts to write
> obviously incorrect or inadequate metadata. CMOR is available through
> http://www-pcmdi.llnl.gov/software-portal/cmor
> All but a couple of the modeling groups that provided data to the IPCC
> archive rewrote their data through CMOR (and the others were able to
> mimic CMOR's output), so someone at each of these modeling groups should
> already be familiar with this way of doing things.
>
> Having said all of this, I (like Steve) am not necessarily recommending
> this way of doing things. The best approach for your project will
> depend to a certain extent on both who the contributing groups are and
> who the analysts are. There may, of course, be other practical or
> political considerations you'll need to contend with.
>
> I'd be happy to discuss this further with you, if you think my input
> will be useful.
>
> Best regards,
> Karl
>
>
>
> Steve Hankin wrote:
>> Hi Francisco,
>>
>> This is not a recommendation, but just an "aside" note -- something to
>> consider. Namely, various groups in the community (e.g. the NOMADS
>> project at NOAA/NCDC) have quite a bit of experience with serving GRIB
>> files directly into OPeNDAP without the need to reformat them. This
>> can be done by either the "GDS" server (Grads-based) or the "TDS"
>> server (part of the THREDDS project at Unidata). The resulting Web
>> access to the data, while it does not look identical to a CF dataset,
>> is close enough so that many software applications can work with it.
>> The OPeNDAP vision of Web services is that it should not be necessary
>> to reformat (duplicate) your data.
>>
>> fyi - Steve
>>
>> ==============================
>>
>> Francisco Doblas-Reyes wrote:
>>> Hi,
>>>
>>> The EU-funded ENSEMBLES project is generating a large set of
>>> seasonal-to-decadal (s2d) multi-forecast system ensemble hindcasts.
>>> Multi-forecast systems include multi-models and perturbed-parameter
>>> ensembles. The ENSEMBLES s2d hindcasts mimic the European multi-model
>>> seasonal ensemble operational forecasts. The data are written in
>>> GRIB, but we intend to improve their dissemination by making the data
>>> also available in NetCDF format.
>>>
>>> We found that the current standard names do not allow to describe the
>>> structure of multi-forecast system ensemble forecasts using the CF
>>> convention. Therefore, we would like to propose some additional CF
>>> standard names to avoid ambiguities when coding multi-forecast system
>>> ensemble data:
>>>
>>> 1) experiment_identifier (STRING). The producing centre is
>>> responsible for assigning unique experiment identifiers for the
>>> different experiments created, and should (ideally) provide
>>> documentation of each experiment. It is possible for common
>>> experiment identifiers to be agreed between different centres, if
>>> they are carrying out a common experiment or creating a multi-model
>>> forecast system. There is no a priori guarantee that identical
>>> identifiers from different centres could refer to scientifically
>>> equivalent experiments.
>>> 2) originating_centre (STRING). Institution with scientific
>>> responsibility for the forecast system.
>>> 3) forecast_system_version_number (INTEGER, units=1). This number
>>> should be used to distinguish between different prediction systems
>>> used by the same institution. For instance, the Met Office will have
>>> to choose a system number for the GloSea model and a different one
>>> for the DePreSys system (both based on the HadCM3 coupled model). It
>>> is assigned by the producing centre and gives scientific details of
>>> the models used. A table online should provide the corresponding
>>> information.
>>> 4) forecast_method_number (INTEGER, units=1). This variable
>>> distinguishes forecasts made with the same underlying forecasting
>>> system, but where variations have been introduced such that the
>>> different integrations have different properties, most importantly
>>> different climate drift. An example is given by the several members
>>> of a perturbed parameter ensemble forecast, which should share the
>>> "forecast_system_version_number" but have different values of the
>>> "forecast_method_number". As for "forecast_system_version_number", a
>>> table online should provide the corresponding information.
>>> 5) ensemble_member_number (INTEGER, units=1). Different integrations
>>> made with the same origin, experiment identifier, method and system
>>> number created using initial-condition perturbations, which form a
>>> homogenous and a priori statistically indistinguishable ensemble.
>>>
>>> A single multi-forecast system experiment includes data from multiple
>>> forecast systems, either from a single centre or from several. The
>>> variables 1-4 make a natural tuplet to define a particular homogenous
>>> multi-forecast system ensemble forecast. The ensemble is then spanned
>>> by the ensemble_member_number variable. For instance, a multi-model
>>> ensemble forecast or a perturbed-parameter ensemble is made of a
>>> collection of such tuplets.
>>>
>>> Although not actually needed for distribution and archive purposes,
>>> additional variables with the same dimension as the variable
>>> experiment_identifier are also suggested:
>>>
>>> 1) original_distributor (STRING). Centre with responsibility for
>>> distribution of data, ie the centre who first made the data publicly
>>> available, and to whom queries of data integrity should be sent.
>>> 2) production_status (STRING). Operational, research or a user
>>> defined project identifier. The value "research" should be used for
>>> general research at a specific centre, while project_id should be
>>> used for specified international research projects.
>>> 3) sst_specification (STRING). It describes the use of the SSTs in
>>> the specific experiment and can take values such as "coupled",
>>> "observed", "predicted", "persisted anomaly" or "persisted absolute".
>>> 4) real_time (CHARACTER). It takes the values "true" or "false",
>>> according to whether the forecast was or not made in real-time. It is
>>> an attribute of the individual forecasts.
>>> 5) archive_date (INTEGER, units=days from specific date). Describes
>>> when the data was archived or published. The aim is to provide an
>>> approximate timestamp, to easily distinguish between recent
>>> experiments and much older ones. Also, in the case that data need to
>>> be corrected in a globally distributed data system, the archive_date
>>> could be used to distinguish between the older, original data and the
>>> newer, corrected data. This is an attribute of the individual forecast.
>>>
>>> Some relevant issues for the encoding of multi-forecast system
>>> ensembles data are as follows:
>>>
>>> - We use the variables "forecast_period" and
>>> "forecast_reference_time" as independent time variables employed to
>>> define the two time axes of a forecast dataset with several start
>>> dates, ie, both "forecast_period" and "forecast_reference_time" are
>>> multivalued. We believe that "forecast_period" cannot have time units
>>> referenced to a specific date as "forecast_reference_time" does. This
>>> is to prevent having in the file forecasts with the same verifying
>>> date but produced from a different start date (and, hence,
>>> intrinsically different). An alternative would consist in introducing
>>> an index dimension and make two one-dimensional auxiliary time
>>> coordinate variables with this dimension, as suggested by Jonathan
>>> Gregory in the thread "file with both run time and forecast (valid)
>>> time coordinates". Any thoughts about this?
>>>
>>> - It has to be mentioned that although "realization" is an existing
>>> standard name to handle ensembles, it can be used to identify either
>>> a forecast_system_version_number (a member of a multi-model ensemble
>>> from the same institution), a forecast_method_number (a member of a
>>> perturbed-parameter ensemble) or an ensemble_member_number (a member
>>> of an initial-condition ensemble). This is problematic as a
>>> multi-forecast system ensemble dataset might have to use those three
>>> variables dimensioned independently. Therefore, the use of these
>>> three variables is suggested instead to distinguish the elements of
>>> an experiment carried out with a multi-forecast system.
>>>
>>> The proposed names take account of established practice at
>>> operational centres and usual practice in the research community of
>>> climate variability at different time scales. These names are part of
>>> a more general proposal to unambiguously define the appropriate
>>> metadata for multi-forecast system ensembles, which is based upon a
>>> more general proposal under discussion by WCRP. The proposal is
>>> available from:
>>> http://www.ecmwf.int/research/EU_projects/ENSEMBLES/data/index.html
>>>
>>> The names and data structure suggested in this message are likely to
>>> be relevant for other operational multi-forecast system ensemble
>>> forecast activities such as EUROSIP or TIGGE.
>>>
>>> Apologies for the long message.
>>> Best regards,
>>> Paco
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>

-- 
________________________________________
Francisco J. Doblas-Reyes
European Centre for Medium-Range
Weather Forecasting (ECMWF)
Shinfield Park, RG2 9AX
Reading, UK
Tel: +44 (0)118 9499 655
Fax: +44 (0)118 9869 450
f.doblas-reyes at ecmwf.int
_______________________________________
Received on Wed Oct 18 2006 - 09:47:36 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒