⇐ ⇒

[CF-metadata] Towards recognizing and exploiting hierarchical groups (Charlie Zender - Steve Hankin - Richard Signell)

From: Charlie Zender <zender>
Date: Mon, 16 Sep 2013 22:19:56 -0700

Hi Martin,

When NCO's ncecat aggregates multiple files into a single file, it
does as you describe and puts each into its own group with all the
global metadata now as group metadata. What should be the new global
metadata of the group file? neceat duplicates the first file's
metadata for that, and just updates the history attribute:

http://nco.sf.net/nco.html#ncecat

I didn't put much thought into it and there was no convention to
follow so the ncecat behavior may offend those who wish to minimize
metadata, whose side I am usually on but not in this case.

Glad you like the proposed scope rule. Curious to hear if other viable
scoping rules will be proposed.

Hierarchies are, as you say, a reality. NASA loves hierarchies--by
which I mean data hierarchies of course! HIRDLS L3 data are 4 groups
deep. ICESat GLAS datasets are 10 groups deep. By comparison, CMIP5
datasets are (by agreement) flat, flat, flat.

But IMHO the best reason to make "group-aware" CF is to facilitate
more commonsense storage of intrinsically hierarchical datasets, e.g.,
CMIP5 scenario->model->ensemble = RCP850->CESM->Historical.
Why flatten that? (I use CMIP5 only as a well-known example of a
dataset type, not to pick on it. All hail CMIP5!). Imagine if
researchers could encapsulate all the data for their research question
into one or a few self-describing hierarchical group files...

The effects of group hierarchies on XML representation and web services
are beyond my ken.

c

Le 16/09/2013 11:38, Schultz, Martin a ?crit :
> Dear all,
>
> I very much welcome the initiative by Charlie, Ted and Peter on hierarchical data structures. Without wanting to offend anyone, the arguments against this brought forward by Steve and Richard sound to me a bit cowardly - but of course they do have a point in trying to ensure backward compatibility. I don't believe that we should pinpoint CF for all eternity to netcdf 3 files, because this inevitably means to render it meaningless at some point in time. And, as Charlie "threatened", there may be (are?) other groups defining conventions which would certainly step in. We did have a couple of discussions already which probably could have been solved easier if there were a group concept in CF. Specifically, I remember a discussion on the data model about inheritance of "global" attributes when data from different files were merged. In a group concept, you could simply define the content of each file as one group and re-define the global attributes from all files as group at!
 tri
> butes. Just as Charlie discussed in the paragraph about "scope". So, I am definitively all for at least this aspect of hierarchical structures. And, as long as we are only concerned about scoping, it doesn't sound to be overly complicated to map hierarchical structures onto flat files (break groups apart into individual files), so that interoperable applications could flexibly deal with one or the other.
>
> The real issue is indeed the definition of some sort of relation between or among the groups. Then again, this problem seems to be the same as defining relations between/among files if you intend to somehow connect them (may this be concatenating, (ensemble) averaging, or whatever). From a practical point of view, I would think that following the concept of the NCOs could be a useful starting point. By this I mean let's first worry about frequently used operations such as those that have been implemented as NCO operators. As this appears a limited set of relations, it should be possible to map those onto CF (any may be this would indeed be the time to think about CF 2.0?). Nevertheless, a little bit of "philosophical" discussion up front may also not hurt in order to see where possible limitations may be hidden. In fact, as indicated above, I believe that such a discussion could also help the CF datamodel become robust for future extensions, and it may provide some insi!
 ght
> s for good implementations of the data model - because hierarchies are a reality.
>
> On a somewhat different note, I would just like to point out that hierarchies also cause tremendous problems in the linking of data catalogues. Who says what goes on top and what follows next? Unfortunately, at some point, this structure (hierarchy) needs to be models in an XML, attribute, or whatever structure, and then it becomes quite complicated to translate it into another hierarchy in order to harmonize finding and accessing datasets from multiple sources. Some would argue that this is what conventions are made for. However, my response to this would be that you will then see as many conventions as (practical) hierarchies sprouting up. So, perhaps we need to think even beyond HDF and netcdf4 ??
>
> Best regards,
>
> Martin


-- 
Charlie Zender, Earth System Sci. & Computer Sci.
University of California, Irvine 949-891-2429 )'(
Received on Mon Sep 16 2013 - 23:19:56 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒