Hi Francisco,
This is not a recommendation, but just an "aside" note -- something to
consider. Namely, various groups in the community (e.g. the NOMADS
project at NOAA/NCDC) have quite a bit of experience with serving GRIB
files directly into OPeNDAP without the need to reformat them. This can
be done by either the "GDS" server (Grads-based) or the "TDS" server
(part of the THREDDS project at Unidata). The resulting Web access to
the data, while it does not look identical to a CF dataset, is close
enough so that many software applications can work with it. The OPeNDAP
vision of Web services is that it should not be necessary to reformat
(duplicate) your data.
fyi - Steve
==============================
Francisco Doblas-Reyes wrote:
> Hi,
>
> The EU-funded ENSEMBLES project is generating a large set of
> seasonal-to-decadal (s2d) multi-forecast system ensemble hindcasts.
> Multi-forecast systems include multi-models and perturbed-parameter
> ensembles. The ENSEMBLES s2d hindcasts mimic the European multi-model
> seasonal ensemble operational forecasts. The data are written in GRIB,
> but we intend to improve their dissemination by making the data also
> available in NetCDF format.
>
> We found that the current standard names do not allow to describe the
> structure of multi-forecast system ensemble forecasts using the CF
> convention. Therefore, we would like to propose some additional CF
> standard names to avoid ambiguities when coding multi-forecast system
> ensemble data:
>
> 1) experiment_identifier (STRING). The producing centre is responsible
> for assigning unique experiment identifiers for the different
> experiments created, and should (ideally) provide documentation of
> each experiment. It is possible for common experiment identifiers to
> be agreed between different centres, if they are carrying out a common
> experiment or creating a multi-model forecast system. There is no a
> priori guarantee that identical identifiers from different centres
> could refer to scientifically equivalent experiments.
> 2) originating_centre (STRING). Institution with scientific
> responsibility for the forecast system.
> 3) forecast_system_version_number (INTEGER, units=1). This number
> should be used to distinguish between different prediction systems
> used by the same institution. For instance, the Met Office will have
> to choose a system number for the GloSea model and a different one for
> the DePreSys system (both based on the HadCM3 coupled model). It is
> assigned by the producing centre and gives scientific details of the
> models used. A table online should provide the corresponding information.
> 4) forecast_method_number (INTEGER, units=1). This variable
> distinguishes forecasts made with the same underlying forecasting
> system, but where variations have been introduced such that the
> different integrations have different properties, most importantly
> different climate drift. An example is given by the several members of
> a perturbed parameter ensemble forecast, which should share the
> "forecast_system_version_number" but have different values of the
> "forecast_method_number". As for "forecast_system_version_number", a
> table online should provide the corresponding information.
> 5) ensemble_member_number (INTEGER, units=1). Different integrations
> made with the same origin, experiment identifier, method and system
> number created using initial-condition perturbations, which form a
> homogenous and a priori statistically indistinguishable ensemble.
>
> A single multi-forecast system experiment includes data from multiple
> forecast systems, either from a single centre or from several. The
> variables 1-4 make a natural tuplet to define a particular homogenous
> multi-forecast system ensemble forecast. The ensemble is then spanned
> by the ensemble_member_number variable. For instance, a multi-model
> ensemble forecast or a perturbed-parameter ensemble is made of a
> collection of such tuplets.
>
> Although not actually needed for distribution and archive purposes,
> additional variables with the same dimension as the variable
> experiment_identifier are also suggested:
>
> 1) original_distributor (STRING). Centre with responsibility for
> distribution of data, ie the centre who first made the data publicly
> available, and to whom queries of data integrity should be sent.
> 2) production_status (STRING). Operational, research or a user defined
> project identifier. The value "research" should be used for general
> research at a specific centre, while project_id should be used for
> specified international research projects.
> 3) sst_specification (STRING). It describes the use of the SSTs in the
> specific experiment and can take values such as "coupled", "observed",
> "predicted", "persisted anomaly" or "persisted absolute".
> 4) real_time (CHARACTER). It takes the values "true" or "false",
> according to whether the forecast was or not made in real-time. It is
> an attribute of the individual forecasts.
> 5) archive_date (INTEGER, units=days from specific date). Describes
> when the data was archived or published. The aim is to provide an
> approximate timestamp, to easily distinguish between recent
> experiments and much older ones. Also, in the case that data need to
> be corrected in a globally distributed data system, the archive_date
> could be used to distinguish between the older, original data and the
> newer, corrected data. This is an attribute of the individual forecast.
>
> Some relevant issues for the encoding of multi-forecast system
> ensembles data are as follows:
>
> - We use the variables "forecast_period" and "forecast_reference_time"
> as independent time variables employed to define the two time axes of
> a forecast dataset with several start dates, ie, both
> "forecast_period" and "forecast_reference_time" are multivalued. We
> believe that "forecast_period" cannot have time units referenced to a
> specific date as "forecast_reference_time" does. This is to prevent
> having in the file forecasts with the same verifying date but produced
> from a different start date (and, hence, intrinsically different). An
> alternative would consist in introducing an index dimension and make
> two one-dimensional auxiliary time coordinate variables with this
> dimension, as suggested by Jonathan Gregory in the thread "file with
> both run time and forecast (valid) time coordinates". Any thoughts
> about this?
>
> - It has to be mentioned that although "realization" is an existing
> standard name to handle ensembles, it can be used to identify either a
> forecast_system_version_number (a member of a multi-model ensemble
> from the same institution), a forecast_method_number (a member of a
> perturbed-parameter ensemble) or an ensemble_member_number (a member
> of an initial-condition ensemble). This is problematic as a
> multi-forecast system ensemble dataset might have to use those three
> variables dimensioned independently. Therefore, the use of these three
> variables is suggested instead to distinguish the elements of an
> experiment carried out with a multi-forecast system.
>
> The proposed names take account of established practice at operational
> centres and usual practice in the research community of climate
> variability at different time scales. These names are part of a more
> general proposal to unambiguously define the appropriate metadata for
> multi-forecast system ensembles, which is based upon a more general
> proposal under discussion by WCRP. The proposal is available from:
> http://www.ecmwf.int/research/EU_projects/ENSEMBLES/data/index.html
>
> The names and data structure suggested in this message are likely to
> be relevant for other operational multi-forecast system ensemble
> forecast activities such as EUROSIP or TIGGE.
>
> Apologies for the long message.
> Best regards,
> Paco
Received on Mon Oct 16 2006 - 21:55:50 BST