[CF-metadata] some concerns about the "ensemble axis" proposal from Bryan Lawrence on 2007-03-07 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Bryan Lawrence <b.n.lawrence>
Date: Wed, 07 Mar 2007 17:35:20 +0000

Steve

I'm obviously not getting my main point across. Fine: build an
aggregation server, it serves up what? A file? A sequence of files? I
think the former? So regardless of what you did server side, as a client
I'm going to get a file, and it may well have aggregated a number of
ensemble members. I want that file to be CF compliant!

So imagine Thredds serves me a temperature field for timestep 0 from ten
ensemble members, which are from a multi-model ensemble - but
fortunately they're on the same grid. How do we represent that? CF ought
to be able to do that!

Bryan

On Wed, 2007-03-07 at 09:31 -0800, Steve Hankin wrote:
>
>
> Bryan Lawrence wrote:
> > Hi Folks, especially Balaji and Steve
> >
> > I'll make some general comments, and then take Balaji's questions.
> >
> > Firstly, Thredds has nothing to do with this issue, and that's my point
> > from the November email, and which I was restating in reply to Steve's
> > point. If we have to appeal to *any* external *software* package to
> > define our metadata, then our convention is broken. (However, I have no
> > problem with appealing to external *definitions* of internal
> > identifiers.)
> >
> Hi Bryan et. al.,
> [Please accept my bowing and scraping in advance here both for
> being long and wordy and for any appearance of being preachy.
> Assertions like "THREDDS has nothing to do with this issue"
> and "Aggregation servers are a red herring" illustrate such a
> fundamental divergence in perceptions that I feel compelled to
> go back to the history of CF and our fundamentals.]
> It is wise that we have chosen to schedule a full day on CF for our
> GO-ESSP meeting this summer, because we have some fundamental (and
> probably difficult) issues to reach agreement on. For a very large
> fraction of the CF community "files" were supplanted by aggregations
> as the foundation of netCDF data management a very long time ago. In
> Ferret this happened in 1995. GrADS I think was a little earlier.
> CDAT and NCL probably around this time, too. I'm sure there have
> been plenty of other application-specific aggregation solutions
> developed, too ... also probably a long time ago.
>
> The fact that so many applications separately developed solutions is a
> clear indication that aggregation is a fundamental need. I have to
> assume, Bryan, that when you referred to aggregation servers as a red
> herring your emphasis is on the "server" part (which I will get to).
> Because the need for aggregation with CF data seems inescapable.
>
> The fact that so many applications separately developed solutions is
> also a clear indication of inefficiency. Group after group developing
> the same capability .... That's why it made so much sense when
> Unidata released a fully integrated aggregation solution in 2001 ...
> and many enhancements to it since then (the entire NcML framework).
> For those lucky enough to be programming in Java, this was a
> transparent solution for aggregating local files. For those using C
> or C-dependent code the aggregations were available only through
> OPeNDAP connections. Here we see a hole and the beginning of a
> parting of the ways, since some parts of the community found OPeNDAP
> to be excellent. Other parts found it unacceptable. Arguably it is
> that parting of the ways that lies at the heart of our differing
> outlooks on ensembles today. I would note, however, that at the last
> GO-ESSP meeting we made a group commitment to the development of a C
> library that will capture the richness of what Java access to netCDF
> offers -- hopefully including the entire NcML framework for
> aggregation.
>
> These dates -- 1995, 2001 -- are ancient history. Since then "service
> oriented" concepts have come to dominate our discussions. (E.g.
> server-side transformation capabilities on CF files are now bread and
> butter discussion topics.) THREDDS is one example of a
> service-oriented approach. Just as aggregation allowed us to replace
> files with a higher order abstraction -- the dataset -- the service
> oriented abstractions allow us to handle collections of datasets as
> single entities. Some may like THREDDS. Others may not. Discussion
> of the trade-offs ought to be happening. But it is an extreme
> rhetorical stand to say that it has "nothing to do with the problem".
>
> The power of the service-oriented approach has everything to do with
> the options available for handling ensembles. If we accept that
> THREDDS-like metadata catalogs are within the scope of CF discussions,
> then we open a huge and fertile domain for solving problems in ways
> that are 100% harmonious with the current CF usage. If we reject the
> discussion of catalogs, then we are forced to pile greater and greater
> complexity into the body of our CF files, muddying fundamental CF
> concepts and ultimately providing only a partial solution to the
> ensemble problem.
>
> Conclusion: I reject the assertion that discussion of catalog level
> metadata is out of bounds. One could as well argue that the ensemble
> problem, itself is out of bounds. When we confront fundamental new
> problems in CF, we may need to introduce fundamental new tools. The
> catalog is the natural place in the CF data model for the concept of
> an ensemble to exist. Currently there has been no proposal placed on
> the table to handle multi-grid ensembles. (Clearly THREDDS offers
> some obvious directions). We're treating the multi-grid problem like
> the elephant in the living room. Lets address it and see where that
> leads our thinking.
>
> - Steve
> > Secondly, netcdf4 is also a red herring, because folk have to use
> > netcdf3 now, and will have to do so for a while to come. (Further, I
> > can't seriously believe that on the one hand we have an argument that
> > adding another axis is an engineering problem, but using a different API
> > to the persistence format is not ... both ways, software will need
> > adjustment, but in the former case we are working on top of a known and
> > reliable persistence format. I can tell you for a fact that we wont be
> > accepting netcdf4 data in 2007 for the BADC ... not because I don't
> > like it, but because it has not yet got a track record!).
> >
> > So where does that leave us?
> > * It leaves us with certain classes of ensemble data, that we have
> > available a priori (i.e. at file writing time), and that can be stored
> > in files in a certain way, and these are the ones that we are proposing
> > a solution for. These work fine with the proposed solution.
> > * there are also classes of ensemble data that we might what to
> > aggregate a postiori (i.e we don't have them at file write time, or that
> > cannot be stored into an array which has the same underlying coordinate
> > system). Well frankly, how is that different from *any* other existing
> > situation? (The Unified Model 4.5 had P and UV on different grids, but
> > we can still put them in the same file, I could even put them in the
> > same file with an ensemble axis for each). I can always find an example
> > where I want to add more data to a file later which isn't in the time
> > dimension (so I rewrite the file). Ensembles are simply not special in
> > this regard!
> >
> > (Aggregation servers are a red herring, in the final analysis, what I
> > get from aggregation servers are files, so let's care about the
> > persistence format, not the interface definition).
> >
> >
> > > Is the ensemble axis static? (i.e not UNLIMITED)? What happens if I want
> > > to increase the size of an ensemble later? (We recently added 2 members
> > > to a 3-member initial-condition ensemble we've submitted to IPCC AR4).
> > >
> >
> > So rewrite the data, but nobody is saying you *have* to have ensembles
> > all in one file, just as you don't rely on having all the variables in
> > one file. In the latter case, for sure one needs external information to
> > make the links, but let's not appeal to any *specific* software to do
> > it.
> >
> >
> > > For the kinds of ensembles we have in mind, can we stay within the file
> > > size limits?
> > >
> >
> > No one is arguing that 1 file = 1 dataset.
> >
> >
> > > I certainly wasn't meaning to suggest, or even imply, any software
> > > choices or aggregation methods to go along with this. This is a comment
> > > about metadata only.
> > >
> >
> > Fair enough, I agree with your perspective.
> >
> > Cheers
> > Bryan
> >
> > _______________________________________________
> > CF-metadata mailing list
> > CF-metadata at cgd.ucar.edu
> > http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
> >
>
> --
> --
>
> Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
> 7600 Sand Point Way NE, Seattle, WA 98115-0070
> ph. (206) 526-6080, FAX (206) 526-6744
Received on Wed Mar 07 2007 - 10:35:20 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST