⇐ ⇒

[CF-metadata] CF and multi-forecast system ensemble data

From: Steve Hankin <Steven.C.Hankin>
Date: Tue, 31 Oct 2006 09:32:56 -0800

All,

Like others I've watched this email thread grow like Topsy, wondering if
I would find the time to read it. At last I did this morning. So here
is yet another message that has to begin with an apology that this
discussion is a lot to get one's arms around, and with bowing and
scraping for the mistakes of interpretation that I will doubtlessly
make. I'll doubly apologize for wading in as a late-arriving
trouble-maker. But the discussion topic seems big enough to warrant a
fair amount of rough and tumble.

I'd like to ask that we step back for a moment to talk about the long
term evolution and stability of CF. There is pretty wide agreement that
in the past, when thinking of how models might spit out their outputs,
we have often not given adequate weight to the issues of software that
read the data. We want to guard the simplicity and stability of the CF
standard to the degree that we can. The quotation that Russ Rew brought
into our discussions at June's GO-ESSP meeting is a valuable reference
point:

    To create quality software [or standards], the ability to say "no"
    is usually far more important than the ability to say "yes."

OK. Enuf preamble. In a nutshell, the new proposed structures seem to
capture the semantics of the various collections of model outputs --
ensembles and forecast collections -- through the addition of new
dimensions. In the most extreme case this dimension list might become
(realization,forecast_reference_time,forecast_period,lev,lat,lon).

I'd pose two questions:

   1. Will this approach break existing CF applications? If yes, is
      that a red flag to consider other options?
   2. Is this the same approach that we would take if we already had
      netCDF 4? If no, is that a red flag that we should give more
      thought to the long-term stability of the standard?

Thinking in terms of netCDF 4: would we solve this question by adding
new dimensions to the basic CF 4-dimensional outlook? Or would we
instead use the new concept of "groups"? I suspect that we would use
groups. Doing so would preserve intact the basic 4D CF structure (the
"CF nut") that encapsulates the output of a single geophysical model
run. It would define a smoother incremental growth path for the
software that reads the data. And it would provide a bigger space than
just additional dimensions in which to capture the growing complexity of
the models -- time dependent coordinates, unstructured grids, etc.

How could we achieve the equivalent effect within the more limited tool
set of netCDF 3? I think John Caron's "Forecast aggregation server"
illustrates a solution. Namely, use the THREDDS catalog as the place to
embed the new, higher-level structures -- membership in a forecast
aggregation and/or an ensemble. This approach preserves the "CF nut" --
the basic 4D structure that describes a dataset. It preserves backwards
compatibility for existing applications. It gives full access to the
ensemble and forecast dimensions for newly written or modified
software. And it segues smoothly into next-generation netCDF 4 structures.

Admittedly, this approach represents a compromise -- it does not provide
a way to capture all of the complexity in a single, local file in the
short term. But this objection, if a major barrier, could probably be
largely overcome through the creation of some fairly modest code
libraries (building on TDS) that expose THREDDS-like navigation
capabilities over collections of local files.

    - Steve

===============================================================================

Francisco Doblas-Reyes wrote:
> Dear all,
>
> I have to reckon that I was a bit confused by the huge amount of
> information received in the last few days. After reading them
> carefully, below there is some comments in as a structured way as I
> found possible:
>
> - I agree that the whole issue of dealing with ensembles,
> multi-models, multi-forecast systems, operational systems, etc. is far
> from obvious. I guess that is the reason why the job was not done before.
>
> - Please, bear in mind that if I used the project ENSEMBLES as an
> example, or Jamie the IPCC, it's only as an illustration. My intention
> has been since the beginning to try to propose a solution to make
> NetCDF more useful and attractive to **operational forecasting
> centres**. The word operational is key here. It's not by chance that
> institutions such as NCEP, UK Met Office, JMA or ECMWF do not
> disseminate their daily operational ensemble products in NetCDF format.
>
> - My original intention when started this discussion was to find a
> clear way of writing files with multi-forecast system ensemble
> integrations. In response to John's message, this means being able to
> write all the members of an ensemble (coming from either several
> forecast systems or from a single one) in a single file and being able
> to identify them. Of course, the type of file may change with the
> user: in the context of a multi-model, the contributors might prefer
> to prepare files with the ensemble forecasts produced with their own
> individual models if they run them in separate institutions, while in
> a perturbed-physics context a single institution will prefer to write
> the forecasts from the different model versions in a single file.
>
> - Bryan wonders about the adequacy of defining specific variable
> metadata in multi-forecast system files to distinguish ensemble
> member, model, system and so on, and compares this problem with the
> interest of having additional informative metadata in a file
> containing station data. In my opinion, the essential difference is
> that a file with multi-forecast system ensemble simulations is not a
> simple gathering of predictions from different sources or with
> different initial conditions, but a complete forecast in itself. The
> additional metadata is not a wish, but a need to describe an entity.
> Without the appropriate metadata the file is not self-descriptive and
> won't be operationally disseminated, in the same way that operational
> centres don't disseminate deterministic forecasts in any format
> without clearly specifying the forecast system.
>
> - I understood at the beginning that the main interest of NetCDF is
> that one doesn't need to rely on external tables to identify the data.
> While the use of external vocabularies might offer a simple way of
> avoiding the creation of additional metadata, it uses a strategy that
> I presumed made the difference between NetCDF and GRIB: the use of
> external tables and extensions. In my opinion, Bryan is right: the
> core of the discussion is issue D. However, from my experience with
> operational ensembles, forecasters may make the move from GRIB1 to
> GRIB2 without paying much attention to the use of NetCDF due the lack
> of rules to encode the operational forecasts in NetCDF. Maybe WMO/IPCC
> should take the lead on this.
>
> - Of course, an immediate issue, as John and Bryan have identified, is
> how to disseminate these files using a THREDDS Data Server. At
> present, the largest file I can imagine would have as dimensions
> (realization,forecast_reference_time,forecast_period,lev,lat,lon). The
> most likely integrations, in my opinion, would take place on the
> dimensions realization and forecast_reference_time. In ENSEMBLES, we
> plan to work with NetCDF, although we try to learn from what is done
> with GRIB2 in TIGGE.
>
> - In case the option of adding standard names (instead of the use of
> external vocabularies) is accepted, I'd like to propose the standard
> name "ensemble_member" in lieu of the suggested "initial_condition".
> It could be used as an integer to tag the members of the ensemble
> identified with particular values of "experiment_id", "source",
> "institution", regardless of the way the initial conditions of the
> ensemble have been generated.
>
> Apologies for the long message.
> Best regards,
> Paco

-- 
--
Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
7600 Sand Point Way NE, Seattle, WA 98115-0070
ph. (206) 526-6080, FAX (206) 526-6744
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20061031/bd75fb6f/attachment-0002.html>
Received on Tue Oct 31 2006 - 10:32:56 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒