⇐ ⇒

[CF-metadata] some concerns about the "ensemble axis" proposal

From: Steve Hankin <Steven.C.Hankin>
Date: Wed, 07 Mar 2007 09:31:02 -0800

Bryan Lawrence wrote:
> Hi Folks, especially Balaji and Steve
>
> I'll make some general comments, and then take Balaji's questions.
>
> Firstly, Thredds has nothing to do with this issue, and that's my point
> from the November email, and which I was restating in reply to Steve's
> point. If we have to appeal to *any* external *software* package to
> define our metadata, then our convention is broken. (However, I have no
> problem with appealing to external *definitions* of internal
> identifiers.)
>
Hi Bryan et. al.,

    [Please accept my bowing and scraping in advance here both for being
    long and wordy and for any appearance of being preachy. Assertions
    like "THREDDS has nothing to do with this issue" and "Aggregation
    servers are a red herring" illustrate such a fundamental divergence
    in perceptions that I feel compelled to go back to the history of CF
    and our fundamentals.]

It is wise that we have chosen to schedule a full day on CF for our
GO-ESSP meeting this summer, because we have some fundamental (and
probably difficult) issues to reach agreement on. For a very large
fraction of the CF community "files" were supplanted by aggregations as
the foundation of netCDF data management a very long time ago. In
Ferret this happened in 1995. GrADS I think was a little earlier. CDAT
and NCL probably around this time, too. I'm sure there have been
plenty of other application-specific aggregation solutions developed,
too ... also probably a long time ago.

The fact that so many applications separately developed solutions is a
clear indication that aggregation is a fundamental need. I have to
assume, Bryan, that when you referred to aggregation servers as a red
herring your emphasis is on the "server" part (which I will get to).
Because the need for aggregation with CF data seems inescapable.

The fact that so many applications separately developed solutions is
also a clear indication of inefficiency. Group after group developing
the same capability .... That's why it made so much sense when Unidata
released a fully integrated aggregation solution in 2001 ... and many
enhancements to it since then (the entire NcML framework). For those
lucky enough to be programming in Java, this was a transparent solution
for aggregating local files. For those using C or C-dependent code the
aggregations were available only through OPeNDAP connections. Here we
see a hole and the beginning of a parting of the ways, since some parts
of the community found OPeNDAP to be excellent. Other parts found it
unacceptable. Arguably it is that parting of the ways that lies at the
heart of our differing outlooks on ensembles today. I would note,
however, that at the last GO-ESSP meeting we made a group commitment to
the development of a C library that will capture the richness of what
Java access to netCDF offers -- hopefully including the entire NcML
framework for aggregation.

These dates -- 1995, 2001 -- are ancient history. Since then "service
oriented" concepts have come to dominate our discussions. (E.g.
server-side transformation capabilities on CF files are now bread and
butter discussion topics.) THREDDS is one example of a service-oriented
approach. Just as aggregation allowed us to replace files with a higher
order abstraction -- the dataset -- the service oriented abstractions
allow us to handle collections of datasets as single entities. Some may
like THREDDS. Others may not. Discussion of the trade-offs ought to be
happening. But it is an extreme rhetorical stand to say that it has
"nothing to do with the problem".

The power of the service-oriented approach _has everything to do_ with
the options available for handling ensembles. If we accept that
THREDDS-like metadata catalogs are within the scope of CF discussions,
then we open a huge and fertile domain for solving problems in ways that
are 100% harmonious with the current CF usage. If we reject the
discussion of catalogs, then we are forced to pile greater and greater
complexity into the body of our CF files, muddying fundamental CF
concepts and ultimately providing only a partial solution to the
ensemble problem.

Conclusion: I reject the assertion that discussion of catalog level
metadata is out of bounds. One could as well argue that the ensemble
problem, itself is out of bounds. When we confront fundamental new
problems in CF, we may need to introduce fundamental new tools. _The
catalog is the natural place in the CF data model for the concept of an
ensemble to exist_. Currently there has been no proposal placed on the
table to handle multi-grid ensembles. (Clearly THREDDS offers some
obvious directions). We're treating the multi-grid problem like the
elephant in the living room. Lets address it and see where that leads
our thinking.

    - Steve
> Secondly, netcdf4 is also a red herring, because folk have to use
> netcdf3 now, and will have to do so for a while to come. (Further, I
> can't seriously believe that on the one hand we have an argument that
> adding another axis is an engineering problem, but using a different API
> to the persistence format is not ... both ways, software will need
> adjustment, but in the former case we are working on top of a known and
> reliable persistence format. I can tell you for a fact that we wont be
> accepting netcdf4 data in 2007 for the BADC ... not because I don't
> like it, but because it has not yet got a track record!).
>
> So where does that leave us?
> * It leaves us with certain classes of ensemble data, that we have
> available a priori (i.e. at file writing time), and that can be stored
> in files in a certain way, and these are the ones that we are proposing
> a solution for. These work fine with the proposed solution.
> * there are also classes of ensemble data that we might what to
> aggregate a postiori (i.e we don't have them at file write time, or that
> cannot be stored into an array which has the same underlying coordinate
> system). Well frankly, how is that different from *any* other existing
> situation? (The Unified Model 4.5 had P and UV on different grids, but
> we can still put them in the same file, I could even put them in the
> same file with an ensemble axis for each). I can always find an example
> where I want to add more data to a file later which isn't in the time
> dimension (so I rewrite the file). Ensembles are simply not special in
> this regard!
>
> (Aggregation servers are a red herring, in the final analysis, what I
> get from aggregation servers are files, so let's care about the
> persistence format, not the interface definition).
>
>
>> Is the ensemble axis static? (i.e not UNLIMITED)? What happens if I want
>> to increase the size of an ensemble later? (We recently added 2 members
>> to a 3-member initial-condition ensemble we've submitted to IPCC AR4).
>>
>
> So rewrite the data, but nobody is saying you *have* to have ensembles
> all in one file, just as you don't rely on having all the variables in
> one file. In the latter case, for sure one needs external information to
> make the links, but let's not appeal to any *specific* software to do
> it.
>
>
>> For the kinds of ensembles we have in mind, can we stay within the file
>> size limits?
>>
>
> No one is arguing that 1 file = 1 dataset.
>
>
>> I certainly wasn't meaning to suggest, or even imply, any software
>> choices or aggregation methods to go along with this. This is a comment
>> about metadata only.
>>
>
> Fair enough, I agree with your perspective.
>
> Cheers
> Bryan
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>

-- 
--
Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
7600 Sand Point Way NE, Seattle, WA 98115-0070
ph. (206) 526-6080, FAX (206) 526-6744
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20070307/40eb773c/attachment-0002.html>
Received on Wed Mar 07 2007 - 10:31:02 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒