⇐ ⇒

[CF-metadata] Towards recognizing and exploiting hierarchical groups

From: Schultz, Martin <m.schultz>
Date: Tue, 17 Sep 2013 17:12:48 +0000

Hi again,

    I fully support Jim's view! Let's not get hung up on whether groups/hierarchies are good or bad. Instead we should consider them a reality and an option rather than a must. They are a bit like a suitcase which you can pack and unpack to carry things around or store them in your basement until you need them again. In your closet (i.e. on your harddrive) you may prefer to have flat access to your shirts and socks, while on travel (distributing datasets to students, for example) you may prefer some sort of packaging. And don't tell me there would be a unique way how to pack your suitcase!

> treat groups as files
There should probably be two ways for converting group files to flat files:
a) flatten everything into one file with "." separated name spaces
b) flatten groups into individual files (tools like NCO could then use the name space as part of the file names to be generated)
Conversely, one may want to define rules for aggregating flat files into a hierarchy - allowing for different "models" for how data should be structured
[this is probably beyond CF but related as far as attributes are concerned]

> allow for the concept of inheritance of dimensions (which is native to netCDF-4)
I may be na?ve, but this seems relatively straightforward to me as long as we stay clear of multiple or circular hierarchies. We should be aware of the consequences, though: not all HDF5 files may fit into the new CF framework then! It would actually be good to get a view from the HDF5 experts as to how widespread the multiple ancestor or circular "features" of HDF5 are exploited in real datasets (what would we miss if we adhere to the netcdf4 concept?).

> ... and attributes (which would be a CF convention)



> -----Urspr?ngliche Nachricht-----
> Von: CF-metadata [mailto:cf-metadata-bounces at cgd.ucar.edu] Im Auftrag
> von cf-metadata-request at cgd.ucar.edu
> Gesendet: Dienstag, 17. September 2013 07:52
> An: cf-metadata at cgd.ucar.edu
> Betreff: CF-metadata Digest, Vol 125, Issue 6
>
> Send CF-metadata mailing list submissions to
> cf-metadata at cgd.ucar.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> or, via email, send a message with subject or body 'help' to
> cf-metadata-request at cgd.ucar.edu
>
> You can reach the person managing the list at
> cf-metadata-owner at cgd.ucar.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of CF-metadata digest..."
>
>
> Today's Topics:
>
> 1. Re: Towards recognizing and exploiting hierarchical groups
> (Jim Biard)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 17 Sep 2013 09:51:51 -0400
> From: Jim Biard <jim.biard at noaa.gov>
> To: "cf-metadata at cgd.ucar.edu List" <cf-metadata at cgd.ucar.edu>
> Subject: Re: [CF-metadata] Towards recognizing and exploiting
> hierarchical groups
> Message-ID: <0B9E777C-C317-4CE9-ACEA-CFDF93C2C4EF at noaa.gov>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi.
>
> I strongly support the idea of adding groups to CF. As a data producer and
> consumer, I vastly prefer to have collections of similar items grouped
> together rather than laying about in a single large bin. (I also make extensive
> use of folders on my computer!) I am currently building netCDF-4 files that
> use groups, which allows me to produce single files instead of groups of 35
> files. As attractive as the thought of "containerless data" (clouds of fully self-
> described, individual variables floating in cyberspace) is, I find that a
> significant database system is required to make that functional. When such
> a system is available, variables can be as easily presented from files
> containing groups as from files that don't. When such as system isn't
> available, groups help unaided human brains grasp the organization of the
> data. That's why the hierarchical file system has been such a success. (And I
> admit, this is colored by my own particular biases. I like to sort my email in
> to folders, even sub-folders. I use search when I need to, but I find it much
> quicker to go to the folder where I am more likely to find what I'm looking
> for.)
>
> I also agree that we should take a gradualist approach. If we conceptually
> treat groups as files, allow for the concept of inheritance of dimensions
> (which is native to netCDF-4) and attributes (which would be a CF
> convention), and stop there for now, I think we can then wrestle with more
> complex topics as they come along.
>
> Grace and peace,
>
> Jim
>
> Visit us on
> Facebook Jim Biard
> Research Scholar
> Cooperative Institute for Climate and Satellites NC
> North Carolina State University
> NOAA's National Climatic Data Center
> 151 Patton Ave, Asheville, NC 28801
> e: jim.biard at noaa.gov
> o: +1 828 271 4900
>
>
>
> On Sep 17, 2013, at 7:56 AM, <stephen.pascoe at stfc.ac.uk> wrote:
>
> > Bryan has beaten me to the points I would have made. I think hierarchies
> are over rated at the interface level. Examples abound of where they have
> been abandoned: hierarchal vs relational DBs, XML databases and tools (save
> us from xquery for Netcdf!).
> >
> > Under the hood hierarchies are often necessary for scalability and we all
> use them as a crutch when no better tools exist.
> >
> > I would advocate keeping support for groups very simple. CF could treat
> any netcdf file containing groups as if it was a directory of netcdf files with
> attached metadata. IMO complex rules about inter-group relationships
> should be avoided. I guess attribute inheritance must be an exception here
> but I would urge caution. One of the CF data model tickets has got a detailed
> debate on interpretation of the current standard regarding variable
> attributes overriding global attributes. Lessons from that should be learned.
> >
> > Stephen.
> >
> > --
> > Stephen Pascoe from iPhone
> >
> > On 17 Sep 2013, at 10:10, "Bryan Lawrence"
> <bryan.lawrence at ncas.ac.uk<mailto:bryan.lawrence at ncas.ac.uk>> wrote:
> >
> > Hi Folks
> >
> > CMIP5 is illuminating in a number of ways ... not least because it is
> impossible to come up with a *natural* hierarchy for consumers of the data
> (as opposed to the producers). But even the producers have different ways
> of organising their material (running members of different ensembles all at
> once, or all members of one ensemble at once), then the data has to be
> published and versioned ... and all of a sudden there is no natural hierarchy
> for CMIP5 (although everyone will have their own idea of what it could be ...
> )
> >
> > The advantage of a flat system of objects, which can be linked into multiple
> hierarchies by a layer of metadata/indirection (call it what you like) becomes
> obvious in that context ... you can do faceted browse (and faceted
> assemblage of groups). So it's not so obvious to me that Charlie's examples
> are so compelling ... (indeed, even the NASA examples aren't so compelling
> when you consider some of the data use, which immediately requires us to
> extract and replicate the data into smaller granules in some cases ...)
> >
> > Which leads me naturally onto CF. I think there *is* a case for thinking
> about how we use hierarchical attributes in CF (indeed, we've just been
> arguing about it in another context with the concept of file attributes and
> variable attributes). We could resolve this once and for all by establishing a
> convention for CF which says how we *will* do group attributes as they
> become necessary. (I still think we will eventually want vector concepts more
> naturally represented in files, even though I think files should not be our one
> view of the world.)
> >
> > However, the argument about file and field attributes applies here. What (I
> think) we're talking about (thus far) for groups is metadata aggregation and is
> simply a *file based convention* for simplifying storage, so that when the file
> gets unpacked, the data model says the attributes are owned by each
> individual group member. If it's just that on the table, then I'm OK with this.
> >
> > The scope issue on the other hand, opens a can of worms, and I hope I've
> demonstrated with the CMIP5 preamble, that' it wont be that obvious to
> resolve.
> >
> > Bryan
> >
> >
> >
> >
> > On 17 September 2013 06:26, <zender at uci.edu<mailto:zender at uci.edu>>
> wrote:
> > Hi Russ,
> >
> > Thanks for your input and link to an earlier presentation of yours.
> >
> > Agree that the proposal only applies to group hierarchies, i.e., to
> > groups representable by the Common Data Model 2/extended/enhanced
> > which for practical purposes means groups exposed by the netCDF4 API.
> > Your way of putting it is better because it's more generic: we only
> > seek to define metadata inheritance for hierarchical groups, no matter
> > the external representation of the group.
> >
> > Cheers,
> > cz
> >
> > Le 16/09/2013 12:06, Russ Rew a ?crit :
> >>> Dear all,
> >>
> >> I'm also glad to see this discussion surface. Since I first presented
> >> "Developing Conventions for netCDF-4" at the 2007 GO-ESSP meeting:
> >>
> >> http://www.unidata.ucar.edu/presentations/Rew/nc4-conventions.pdf
> >>
> >> I've been hoping that netCDF-4 feature adoption would begin to gain
> >> traction in the community (see slides 19 and 20 of this 2010
> >> presentation for my "chicken-and-egg logjam" illustration):
> >>
> >>
> http://www.unidata.ucar.edu/presentations/Rew/agu_2010_nc4_Rew.pdf
> >>
> >> I like the Zender-Habermann-Leonard (ZHL?) proposal for Group
> >> Attributes, but would like to point out a potential problem for its use
> >> with HDF Groups: they aren't actually hierarchical. In HDF5, Group A
> >> can be a parent of Group B, which in turn can be a parent of Group A,
> >> forming a cycle instead of a hierarchy. The graph of the Group-subGroup
> >> relation in HDF5 can form an arbitrary directed cyclic graph, though
> >> this is not permitted in netCDF-4, in which only Group *hierarchies* can
> >> be created through the netCDF-4 API.
> >>
> >> Without a restriction to hierarchies, attribute inheritance is not
> >> useful, which is why we required group hierarchies for dimension
> >> inheritance in netCDF-4. So I think the proposal should include a
> >> restriction to only hierarchical Group structures, which also has the
> >> desirable property that each Group, except for the root, has a unique
> >> parent Group.
> >>
> >> --Russ
> >> _______________________________________________
> >> CF-metadata mailing list
> >> CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
> >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> >>
> >
> > --
> > Charlie Zender, Earth System Sci. & Computer Sci.
> > University of California, Irvine 949-891-2429 )'(
> > _______________________________________________
> > CF-metadata mailing list
> > CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> > --
> > Scanned by iCritical.
> >
> >
> >
> > --
> > --
> > Bryan Lawrence
> > University of Reading: Professor of Weather and Climate Computing.
> > National Centre for Atmospheric Science: Director of Models and Data.
> > STFC: Director of the Centre for Environmental Data Archival.
> > Ph: +44 118 3786507 or 1235 445012;
> Web:home.badc.rl.ac.uk/lawrence<http://home.badc.rl.ac.uk/lawrence>
> > _______________________________________________
> > CF-metadata mailing list
> > CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> > --
> > Scanned by iCritical.
> > _______________________________________________
> > CF-metadata mailing list
> > CF-metadata at cgd.ucar.edu
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mailman.cgd.ucar.edu/pipermail/cf-
> metadata/attachments/20130917/c7431b50/attachment.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
> ------------------------------
>
> End of CF-metadata Digest, Vol 125, Issue 6
> *******************************************


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Das Forschungszentrum oeffnet seine Tueren am Sonntag, 29. September, von 10:00 bis 17:00 Uhr: http://www.tagderneugier.de
Received on Tue Sep 17 2013 - 11:12:48 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒