⇐ ⇒

[CF-metadata] CF and multi-forecast system ensemble data

From: Steve Hankin <Steven.C.Hankin>
Date: Wed, 01 Nov 2006 09:26:31 -0800

Bryan Lawrence wrote:
> On Tue, 2006-10-31 at 09:32 -0800, Steve Hankin wrote:
>
>
>> Like others I've watched this email thread grow like Topsy, wondering
>> if I would find the time to read it ... But the discussion topic seems
>> big enough to warrant a fair amount of rough and tumble.
>>
>
> Absolutely. Bring it on :-) :-)
>
> With regard to the stability of CF and long term evolution, that's
> what's behind my wanting to separate maintaining descriptions of the
> quantities measured/predicted from the descriptions of how/why it was
> done. It's also why I want just one way of doing it, not one that is
> specially optimized for numerical models per se, and not of wider
> applicability.
>
>
>
>> OK. Enuf preamble. In a nutshell, the new proposed structures seem
>> to capture the semantics of the various collections of model outputs
>> -- ensembles and forecast collections -- through the addition of new
>> dimensions. In the most extreme case this dimension list might become
>> (realization,forecast_reference_time,forecast_period,lev,lat,lon).
>>
>> I'd pose two questions:
>> 1. Will this approach break existing CF applications? If yes, is
>> that a red flag to consider other options?
>>
>
> I don't believe there is any restriction on number of dimensions in CF,
> so while it may not be pretty, it seems ok to me.
>
>> 1. Is this the same approach that we would take if we already had
>> netCDF 4? If no, is that a red flag that we should give more
>> thought to the long-term stability of the standard?
>>
>
> Let's not think about netCDF4 immediately :-) One of the other things we
> discussed was trying to divorce the content standard from the
> implementation standard ...
>
>
Bryan,

Let me start by saying with full sincerity that I am not arguing for any
particular conclusion. I see major trade-offs to both options -- adding
new dimensions and juggling external metadata. But ...

With very little discussion (just above -- granting there are elements
added below) you have waved away the type of questions that I'd argue we
need to ask ourselves continuously. Standards are _always _about making
compromises. It is right to begin with the approach you are insisting
on -- divorce the abstract understanding of the problem from the messy
technology. But the next step has to be to ask what the undesirable
impacts of your choices might be. And then to make changes to your
thinking and accept compromises. Without stability CF is not a standard
at all.

Regarding no. 1 -- the choice to utilize a 5 or 6 D encoding -- the
consequence will be the the great majority of existing applications will
be unable to read the new files at all without significant
modifications. Inside of those new files will be 4D subsets that
represent the current "CF nut" -- objects that the applications can
currently read. So we have a large net loss of interoperability until
additional investments are made in software across our community.

Regarding no. 2 -- the CF community is not more than 2 years from where
discussions of netCDF 4 will occupy a significant part of our time and
energy (off the cuff estimate). So to propose concrete changes today
without fully considering how you would handle them in netCDF 4, you are
inviting two _major _sets of changes in as many years. That's a very
low standard of stability.

So what are the compromises to be looked at? How would your data
modeling of ensembles and forecasts look different if you think in terms
of dimensions (an array of identical objects) versus "groups" (unordered
lists which may have heterogeneous contents) ? The former is a more
perfect fit to the concepts -- the latter is a more general structure
that fully embraces the needed concepts.

The question is how great are the negative impacts in stability and
interoperability that you are willing to accept in order to work with
the more perfect data model? This is a balancing question where the
"abstract data modeler" view represents one of the polarized positions.
The entire substance of your arguments so far seems to come from this
viewpoint. We acknowledge that CF discussions have not found a balance
between provider and user viewpoints in the past. How do we improve
upon that?

    - Steve

> But your substantive point is fair enough, is there a cleaner way to
> think about this ...?
>
> Particularly from the point of view that a set of simulations comprising
> a forecast IS THE forecast (singular), then I think at least
> (realization,t,z,y,x) is inescapable. I don't like the other example
> (realization,ref_time,period,z,y,x), but accept that it may well be
> the natural output of a set of aggregations. Is it not what Thredds
> would deliver anyway from an aggregation? Which brings me to my last
> point:
>
> As far as Thredds as a stop-gap solution goes. I think we need to
> divorce how we interact with the data from how we manage it
> (particularly for posterity). There is no way that I'm going to rely on
> ANY *interface* to preserve information content, so what you're really
> saying is that you want to rely on metadata held externally from the
> files (whether Thredds or not). To some extent that's inescapable (which
> was one of my points in an earlier email), but we want to stick to the
> requirement that CF can differentiate data (if not fully describe all
> the ancillary information) ... and for forecast ensembles (and station
> data) there is a need for some extra information over and above the
> index value in a "special" dimension. It's that information we need to
> get into the CF content standard.
>
> Also, from an operational perspective, ok, so maybe I can use Thredds or
> any interface to get some data, but then I've got it. What then? The
> content standard has to tell me what's in it, so we're back to CF and
> possibly pointers to external metadata.
>
> (Wrt netcdf4: I see groups as being more useful for aggregating things
> that don't share common dimensionality)
>
> Cheers
> Bryan
>
>

-- 
--
Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
7600 Sand Point Way NE, Seattle, WA 98115-0070
ph. (206) 526-6080, FAX (206) 526-6744
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20061101/a53bd165/attachment-0002.html>
Received on Wed Nov 01 2006 - 10:26:31 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒