⇐ ⇒

[CF-metadata] Towards recognizing and exploiting hierarchical groups

From: Jim Biard <jim.biard>
Date: Tue, 17 Sep 2013 09:51:51 -0400

Hi.

I strongly support the idea of adding groups to CF. As a data producer and consumer, I vastly prefer to have collections of similar items grouped together rather than laying about in a single large bin. (I also make extensive use of folders on my computer!) I am currently building netCDF-4 files that use groups, which allows me to produce single files instead of groups of 35 files. As attractive as the thought of "containerless data" (clouds of fully self-described, individual variables floating in cyberspace) is, I find that a significant database system is required to make that functional. When such a system is available, variables can be as easily presented from files containing groups as from files that don't. When such as system isn't available, groups help unaided human brains grasp the organization of the data. That's why the hierarchical file system has been such a success. (And I admit, this is colored by my own particular biases. I like to sort my email into folders, even sub-folders. I
use search when I need to, but I find it much quicker to go to the folder where I am more likely to find what I'm looking for.)

I also agree that we should take a gradualist approach. If we conceptually treat groups as files, allow for the concept of inheritance of dimensions (which is native to netCDF-4) and attributes (which would be a CF convention), and stop there for now, I think we can then wrestle with more complex topics as they come along.

Grace and peace,

Jim

Visit us on
Facebook Jim Biard
Research Scholar
Cooperative Institute for Climate and Satellites NC
North Carolina State University
NOAA's National Climatic Data Center
151 Patton Ave, Asheville, NC 28801
e: jim.biard at noaa.gov
o: +1 828 271 4900



On Sep 17, 2013, at 7:56 AM, <stephen.pascoe at stfc.ac.uk> wrote:

> Bryan has beaten me to the points I would have made. I think hierarchies are over rated at the interface level. Examples abound of where they have been abandoned: hierarchal vs relational DBs, XML databases and tools (save us from xquery for Netcdf!).
>
> Under the hood hierarchies are often necessary for scalability and we all use them as a crutch when no better tools exist.
>
> I would advocate keeping support for groups very simple. CF could treat any netcdf file containing groups as if it was a directory of netcdf files with attached metadata. IMO complex rules about inter-group relationships should be avoided. I guess attribute inheritance must be an exception here but I would urge caution. One of the CF data model tickets has got a detailed debate on interpretation of the current standard regarding variable attributes overriding global attributes. Lessons from that should be learned.
>
> Stephen.
>
> --
> Stephen Pascoe from iPhone
>
> On 17 Sep 2013, at 10:10, "Bryan Lawrence" <bryan.lawrence at ncas.ac.uk<mailto:bryan.lawrence at ncas.ac.uk>> wrote:
>
> Hi Folks
>
> CMIP5 is illuminating in a number of ways ... not least because it is impossible to come up with a *natural* hierarchy for consumers of the data (as opposed to the producers). But even the producers have different ways of organising their material (running members of different ensembles all at once, or all members of one ensemble at once), then the data has to be published and versioned ... and all of a sudden there is no natural hierarchy for CMIP5 (although everyone will have their own idea of what it could be ... )
>
> The advantage of a flat system of objects, which can be linked into multiple hierarchies by a layer of metadata/indirection (call it what you like) becomes obvious in that context ... you can do faceted browse (and faceted assemblage of groups). So it's not so obvious to me that Charlie's examples are so compelling ... (indeed, even the NASA examples aren't so compelling when you consider some of the data use, which immediately requires us to extract and replicate the data into smaller granules in some cases ...)
>
> Which leads me naturally onto CF. I think there *is* a case for thinking about how we use hierarchical attributes in CF (indeed, we've just been arguing about it in another context with the concept of file attributes and variable attributes). We could resolve this once and for all by establishing a convention for CF which says how we *will* do group attributes as they become necessary. (I still think we will eventually want vector concepts more naturally represented in files, even though I think files should not be our one view of the world.)
>
> However, the argument about file and field attributes applies here. What (I think) we're talking about (thus far) for groups is metadata aggregation and is simply a *file based convention* for simplifying storage, so that when the file gets unpacked, the data model says the attributes are owned by each individual group member. If it's just that on the table, then I'm OK with this.
>
> The scope issue on the other hand, opens a can of worms, and I hope I've demonstrated with the CMIP5 preamble, that' it wont be that obvious to resolve.
>
> Bryan
>
>
>
>
> On 17 September 2013 06:26, <zender at uci.edu<mailto:zender at uci.edu>> wrote:
> Hi Russ,
>
> Thanks for your input and link to an earlier presentation of yours.
>
> Agree that the proposal only applies to group hierarchies, i.e., to
> groups representable by the Common Data Model 2/extended/enhanced
> which for practical purposes means groups exposed by the netCDF4 API.
> Your way of putting it is better because it's more generic: we only
> seek to define metadata inheritance for hierarchical groups, no matter
> the external representation of the group.
>
> Cheers,
> cz
>
> Le 16/09/2013 12:06, Russ Rew a ?crit :
>>> Dear all,
>>
>> I'm also glad to see this discussion surface. Since I first presented
>> "Developing Conventions for netCDF-4" at the 2007 GO-ESSP meeting:
>>
>> http://www.unidata.ucar.edu/presentations/Rew/nc4-conventions.pdf
>>
>> I've been hoping that netCDF-4 feature adoption would begin to gain
>> traction in the community (see slides 19 and 20 of this 2010
>> presentation for my "chicken-and-egg logjam" illustration):
>>
>> http://www.unidata.ucar.edu/presentations/Rew/agu_2010_nc4_Rew.pdf
>>
>> I like the Zender-Habermann-Leonard (ZHL?) proposal for Group
>> Attributes, but would like to point out a potential problem for its use
>> with HDF Groups: they aren't actually hierarchical. In HDF5, Group A
>> can be a parent of Group B, which in turn can be a parent of Group A,
>> forming a cycle instead of a hierarchy. The graph of the Group-subGroup
>> relation in HDF5 can form an arbitrary directed cyclic graph, though
>> this is not permitted in netCDF-4, in which only Group *hierarchies* can
>> be created through the netCDF-4 API.
>>
>> Without a restriction to hierarchies, attribute inheritance is not
>> useful, which is why we required group hierarchies for dimension
>> inheritance in netCDF-4. So I think the proposal should include a
>> restriction to only hierarchical Group structures, which also has the
>> desirable property that each Group, except for the root, has a unique
>> parent Group.
>>
>> --Russ
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>
> --
> Charlie Zender, Earth System Sci. & Computer Sci.
> University of California, Irvine 949-891-2429 )'(
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> --
> Scanned by iCritical.
>
>
>
> --
> --
> Bryan Lawrence
> University of Reading: Professor of Weather and Climate Computing.
> National Centre for Atmospheric Science: Director of Models and Data.
> STFC: Director of the Centre for Environmental Data Archival.
> Ph: +44 118 3786507 or 1235 445012; Web:home.badc.rl.ac.uk/lawrence<http://home.badc.rl.ac.uk/lawrence>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> --
> Scanned by iCritical.
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20130917/c7431b50/attachment-0001.html>
Received on Tue Sep 17 2013 - 07:51:51 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒