Dear all,
I very much welcome the initiative by Charlie, Ted and Peter on hierarchical data structures. Without wanting to offend anyone, the arguments against this brought forward by Steve and Richard sound to me a bit cowardly - but of course they do have a point in trying to ensure backward compatibility. I don't believe that we should pinpoint CF for all eternity to netcdf 3 files, because this inevitably means to render it meaningless at some point in time. And, as Charlie "threatened", there may be (are?) other groups defining conventions which would certainly step in. We did have a couple of discussions already which probably could have been solved easier if there were a group concept in CF. Specifically, I remember a discussion on the data model about inheritance of "global" attributes when data from different files were merged. In a group concept, you could simply define the content of each file as one group and re-define the global attributes from all files as group attributes. Just as Charlie discussed
in the paragraph about "scope". So, I am definitively all for at least this aspect of hierarchical structures. And, as long as we are only concerned about scoping, it doesn't sound to be overly complicated to map hierarchical structures onto flat files (break groups apart into individual files), so that interoperable applications could flexibly deal with one or the other.
The real issue is indeed the definition of some sort of relation between or among the groups. Then again, this problem seems to be the same as defining relations between/among files if you intend to somehow connect them (may this be concatenating, (ensemble) averaging, or whatever). From a practical point of view, I would think that following the concept of the NCOs could be a useful starting point. By this I mean let's first worry about frequently used operations such as those that have been implemented as NCO operators. As this appears a limited set of relations, it should be possible to map those onto CF (any may be this would indeed be the time to think about CF 2.0?). Nevertheless, a little bit of "philosophical" discussion up front may also not hurt in order to see where possible limitations may be hidden. In fact, as indicated above, I believe that such a discussion could also help the CF datamodel become robust for future extensions, and it may provide some insights for good implementations of the
data model - because hierarchies are a reality.
On a somewhat different note, I would just like to point out that hierarchies also cause tremendous problems in the linking of data catalogues. Who says what goes on top and what follows next? Unfortunately, at some point, this structure (hierarchy) needs to be models in an XML, attribute, or whatever structure, and then it becomes quite complicated to translate it into another hierarchy in order to harmonize finding and accessing datasets from multiple sources. Some would argue that this is what conventions are made for. However, my response to this would be that you will then see as many conventions as (practical) hierarchies sprouting up. So, perhaps we need to think even beyond HDF and netcdf4 ??
Best regards,
Martin
> ------------------------------
>
> Message: 3
> Date: Mon, 16 Sep 2013 13:52:50 -0400
> From: "Signell, Richard" <rsignell at usgs.gov>
> To: Steve Hankin <steven.c.hankin at noaa.gov>
> Cc: CF Metadata Mail List <cf-metadata at cgd.ucar.edu>
> Subject: Re: [CF-metadata] Towards recognizing and exploiting
> hierarchical groups
> Message-ID:
> <CAFhraUz6Zw78PheYxsb4mcHSujB5N+iZeMyWGnq0Jco7k=RLMA_at_
> mail.gmail.com>
> Content-Type: text/plain; charset="ISO-8859-1"
>
> Charlie & Co,
>
> Also, regardless of whether these hierarchical structures are stored
> in NetCDF4 or flattened NetCDF3, we get a big boost in
> interoperability when we write datasets with known featureTypes
> (profile, time series collection, swath, etc), because then workflows
> that have performed a
> catalog search and returned dataset endpoints knows what to do. If
> the dataset endpoints contain ad hoc heirarchies it will be a lot more
> difficult.
>
> So if we create hierarchical datasets, I hope we create known
> featureTypes to accompany them (like gridEnsembleStructure).
>
> Thanks,
> Rich
>
> On Mon, Sep 16, 2013 at 12:53 PM, Steve Hankin
> <steven.c.hankin at noaa.gov> wrote:
> > Hi Charlie,
> >
> > Great that you have opened the door onto this discussion topic. Total
> > agreement from my pov that "group-awareness" in CF is an area that is
> crying
> > to be explored and solved. Your analysis of technical details -- e.g.
> > attribute scope and inheritance by group descendents, etc. -- sounds
> natural
> > and sensible.
> >
> > The principle barrier to moving forward along this path lies in the fact
> > that CF is heavily committed to interoperability. Arguably interoperability
> > is the raison d'?tre of CF. The style of "backwards compatibility" that
> > one gets through a headlong switch from the netCDF3 API into the
> > group-oriented elements of the netCDF4 API is the most extreme sort of 1-
> way
> > trap door. It leads to next generation files that are utterly inaccessible
> > to previous generation applications. This style of advancement, which
> > heavily degrades interoperability from a community-wide perspective,
> should
> > only be undertaken IMHO if all reasonable alternatives are exhausted. I
> > welcome discussion on this "philosophical" point.
> >
> > So are there reasonable alternatives to these negative impacts on
> > interoperabilty? It is common practice to flatten groups by dot-appending
> > the name hierarchy: group.subgroup.child. One could certainly envision
> > utilities (in the style of your nco) that could convert a netCDF4-CF file
> > into a netCDf-3 CF file. Could such a translation layer made available as a
> > Web service? Not the worst answer .... A question I'd like to see
> > discussed (primarily to Unidata, I guess): how difficult would it be to make
> > accommodations in the netCDF API, itself, so that netCDF4 groups were
> > accessible through the netCDF-3 API. If such enhancements could be
> baked
> > into the netCDF code the character of the interoperability impacts through
> > adding group-aware elements to CF would be utterly changed. This would
> open
> > the door wide to group-aware CF. Has an analysis of this been done?
> >
> > - Steve
> >
> > ==============================================
> >
> >
> > On 9/15/2013 6:53 PM, Charlie Zender wrote:
> >>
> >> NASA has recently convened an Earth Science Data System Working
> Group to
> >> explore existing conventions for data and products stored in HDF and to
> >> make recommendations for future developments. The CF Conventions
> are an
> >> important element in this work, as many scientists and users are
> >> interested in data products that comply with CF. Many members of the
> >> working group are familiar with CF and have been involved in attempts to
> >> apply the CF Conventions to a variety of Earth Science data products.
> >> [...]
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Das Forschungszentrum oeffnet seine Tueren am Sonntag, 29. September, von 10:00 bis 17:00 Uhr:
http://www.tagderneugier.de
Received on Mon Sep 16 2013 - 12:38:29 BST