[CF-metadata] CF and multi-forecast system ensemble data from Karl Taylor on 2006-10-18 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Karl Taylor <taylor13>
Date: Tue, 17 Oct 2006 16:55:02 -0700

Hi Francisco,

Steve is quite correct that various servers can supply data stored in
various formats via OPeNDAP. It is also true that several plotting
packages can obtain such data and produce plots and enable some types of
analysis.

The value of storing data in CF-compliant files *and* adhering to CF
"best practices", which encourage inclusion of certain types of useful
metadata (e.g., cell bounds or areas, and various other useful
"attributes") follows from the following:

When analysts want to perform certain types of precise calculations and
try out many different variants of some complex (or not so complex)
processing procedure (e.g., various types of EOF analysis, different
"masking" options), they usually want to obtain the data and store it
locally with all it's metadata intact, as long as all the model output
is written according to strict standards. A simple transfer of files all
written in similar structure and format makes this easy. The IPCC
experience taught us this, and analysts have generally reported that it
has been easy for them to analyze results across models.

An alternative is to store data in "native" formats, but write a
"translator" which can then be invoked before delivering the data to the
user. The ECMWF has adopted this method by which they can translate
their ERA40 (reanalysis) data to users when it is requested, delivering
CF-compliant netCDF files in place of the native grib files.

One set of "best practices" are specified in the following document:
http://www-pcmdi.llnl.gov/ipcc/IPCC_output_requirements.htm
A code is available (CMOR), which makes it easier to meet these
requirements, and also provides error messages if one attempts to write
obviously incorrect or inadequate metadata. CMOR is available through
http://www-pcmdi.llnl.gov/software-portal/cmor
All but a couple of the modeling groups that provided data to the IPCC
archive rewrote their data through CMOR (and the others were able to
mimic CMOR's output), so someone at each of these modeling groups should
already be familiar with this way of doing things.

Having said all of this, I (like Steve) am not necessarily recommending
this way of doing things. The best approach for your project will
depend to a certain extent on both who the contributing groups are and
who the analysts are. There may, of course, be other practical or
political considerations you'll need to contend with.

I'd be happy to discuss this further with you, if you think my input
will be useful.

Best regards,
Karl

Steve Hankin wrote:
> Hi Francisco,
>
> This is not a recommendation, but just an "aside" note -- something to
> consider. Namely, various groups in the community (e.g. the NOMADS
> project at NOAA/NCDC) have quite a bit of experience with serving GRIB
> files directly into OPeNDAP without the need to reformat them. This can
> be done by either the "GDS" server (Grads-based) or the "TDS" server
> (part of the THREDDS project at Unidata). The resulting Web access to
> the data, while it does not look identical to a CF dataset, is close
> enough so that many software applications can work with it. The OPeNDAP
> vision of Web services is that it should not be necessary to reformat
> (duplicate) your data.
>
> fyi - Steve
>
> ==============================
>
> Francisco Doblas-Reyes wrote:
>> Hi,
>>
>> The EU-funded ENSEMBLES project is generating a large set of
>> seasonal-to-decadal (s2d) multi-forecast system ensemble hindcasts.
>> Multi-forecast systems include multi-models and perturbed-parameter
>> ensembles. The ENSEMBLES s2d hindcasts mimic the European multi-model
>> seasonal ensemble operational forecasts. The data are written in GRIB,
>> but we intend to improve their dissemination by making the data also
>> available in NetCDF format.
>>
>> We found that the current standard names do not allow to describe the
>> structure of multi-forecast system ensemble forecasts using the CF
>> convention. Therefore, we would like to propose some additional CF
>> standard names to avoid ambiguities when coding multi-forecast system
>> ensemble data:
>>
>> 1) experiment_identifier (STRING). The producing centre is responsible
>> for assigning unique experiment identifiers for the different
>> experiments created, and should (ideally) provide documentation of
>> each experiment. It is possible for common experiment identifiers to
>> be agreed between different centres, if they are carrying out a common
>> experiment or creating a multi-model forecast system. There is no a
>> priori guarantee that identical identifiers from different centres
>> could refer to scientifically equivalent experiments.
>> 2) originating_centre (STRING). Institution with scientific
>> responsibility for the forecast system.
>> 3) forecast_system_version_number (INTEGER, units=1). This number
>> should be used to distinguish between different prediction systems
>> used by the same institution. For instance, the Met Office will have
>> to choose a system number for the GloSea model and a different one for
>> the DePreSys system (both based on the HadCM3 coupled model). It is
>> assigned by the producing centre and gives scientific details of the
>> models used. A table online should provide the corresponding information.
>> 4) forecast_method_number (INTEGER, units=1). This variable
>> distinguishes forecasts made with the same underlying forecasting
>> system, but where variations have been introduced such that the
>> different integrations have different properties, most importantly
>> different climate drift. An example is given by the several members of
>> a perturbed parameter ensemble forecast, which should share the
>> "forecast_system_version_number" but have different values of the
>> "forecast_method_number". As for "forecast_system_version_number", a
>> table online should provide the corresponding information.
>> 5) ensemble_member_number (INTEGER, units=1). Different integrations
>> made with the same origin, experiment identifier, method and system
>> number created using initial-condition perturbations, which form a
>> homogenous and a priori statistically indistinguishable ensemble.
>>
>> A single multi-forecast system experiment includes data from multiple
>> forecast systems, either from a single centre or from several. The
>> variables 1-4 make a natural tuplet to define a particular homogenous
>> multi-forecast system ensemble forecast. The ensemble is then spanned
>> by the ensemble_member_number variable. For instance, a multi-model
>> ensemble forecast or a perturbed-parameter ensemble is made of a
>> collection of such tuplets.
>>
>> Although not actually needed for distribution and archive purposes,
>> additional variables with the same dimension as the variable
>> experiment_identifier are also suggested:
>>
>> 1) original_distributor (STRING). Centre with responsibility for
>> distribution of data, ie the centre who first made the data publicly
>> available, and to whom queries of data integrity should be sent.
>> 2) production_status (STRING). Operational, research or a user defined
>> project identifier. The value "research" should be used for general
>> research at a specific centre, while project_id should be used for
>> specified international research projects.
>> 3) sst_specification (STRING). It describes the use of the SSTs in the
>> specific experiment and can take values such as "coupled", "observed",
>> "predicted", "persisted anomaly" or "persisted absolute".
>> 4) real_time (CHARACTER). It takes the values "true" or "false",
>> according to whether the forecast was or not made in real-time. It is
>> an attribute of the individual forecasts.
>> 5) archive_date (INTEGER, units=days from specific date). Describes
>> when the data was archived or published. The aim is to provide an
>> approximate timestamp, to easily distinguish between recent
>> experiments and much older ones. Also, in the case that data need to
>> be corrected in a globally distributed data system, the archive_date
>> could be used to distinguish between the older, original data and the
>> newer, corrected data. This is an attribute of the individual forecast.
>>
>> Some relevant issues for the encoding of multi-forecast system
>> ensembles data are as follows:
>>
>> - We use the variables "forecast_period" and "forecast_reference_time"
>> as independent time variables employed to define the two time axes of
>> a forecast dataset with several start dates, ie, both
>> "forecast_period" and "forecast_reference_time" are multivalued. We
>> believe that "forecast_period" cannot have time units referenced to a
>> specific date as "forecast_reference_time" does. This is to prevent
>> having in the file forecasts with the same verifying date but produced
>> from a different start date (and, hence, intrinsically different). An
>> alternative would consist in introducing an index dimension and make
>> two one-dimensional auxiliary time coordinate variables with this
>> dimension, as suggested by Jonathan Gregory in the thread "file with
>> both run time and forecast (valid) time coordinates". Any thoughts
>> about this?
>>
>> - It has to be mentioned that although "realization" is an existing
>> standard name to handle ensembles, it can be used to identify either a
>> forecast_system_version_number (a member of a multi-model ensemble
>> from the same institution), a forecast_method_number (a member of a
>> perturbed-parameter ensemble) or an ensemble_member_number (a member
>> of an initial-condition ensemble). This is problematic as a
>> multi-forecast system ensemble dataset might have to use those three
>> variables dimensioned independently. Therefore, the use of these three
>> variables is suggested instead to distinguish the elements of an
>> experiment carried out with a multi-forecast system.
>>
>> The proposed names take account of established practice at operational
>> centres and usual practice in the research community of climate
>> variability at different time scales. These names are part of a more
>> general proposal to unambiguously define the appropriate metadata for
>> multi-forecast system ensembles, which is based upon a more general
>> proposal under discussion by WCRP. The proposal is available from:
>> http://www.ecmwf.int/research/EU_projects/ENSEMBLES/data/index.html
>>
>> The names and data structure suggested in this message are likely to
>> be relevant for other operational multi-forecast system ensemble
>> forecast activities such as EUROSIP or TIGGE.
>>
>> Apologies for the long message.
>> Best regards,
>> Paco
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
Received on Tue Oct 17 2006 - 17:55:02 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST