⇐ ⇒

[CF-metadata] Towards recognizing and exploiting hierarchical groups (Charlie Zender - Steve Hankin - Richard Signell)

From: Bryan Lawrence <bryan.lawrence>
Date: Wed, 18 Sep 2013 13:57:44 +0100

Hi Charlie

(Before I disagree with you, like most on the list, I'm glad we're having
the conversation on this topic, it needs to be had, so thanks!)

I find this particular example completely unhelpful, not least because I
don't see the utility for doing so. However, I can see that others might
see the utility of doing it ... (your suitcase analogy). But at some point,
we all use netcdf as such a suitcase, so the question is why use CF in
particular? Why have a *CF* convention for this?

Well, one might use CF because software can handle it. That might not be as
true as we would like, but finding new ways of organising our material (in
suitcases) wont make it any easier to build conformant software, but heh,
we'd do that if we had another compelling reason ... so moving on.

We might do it because we have complicated "features" in the real world for
which groups will help. I imagine we will eventually get there ....

We might do it because it's the "natural" way to ship or store the data. I
think we can all agree that we are likely to disagree on the natural way to
do things, but we can all agree that we will all have a natural way :-).
So, it does make sense to have a convention for allowing folks to do this.
It's just not obvious (to me) that it needs to be a CF way of doing things.
That said, CF probably shouldn't *stop* folks using groups in netcdf4
files, so we need (IMHO) to come up with the minimum specification that
supports that - but tension it against my earlier comment "is this a
compelling reason to make software more complicated?".

IMHO we could live with groups as a convenience encoding. Nothing more
until we get to features. I think that's consistent with your example. In
doing so, group attributes become member attributes when unpacked, but
group attributes that are not inherited would be a big problem ...

Bryan




On 17 September 2013 21:49, Charlie Zender <zender at uci.edu> wrote:

> Hello Bryan,
>
> Thanks for your perspective/counterexample about whether CMIP5 is in
> fact a natural candidate for hierarchies. I agree with your points
> about the utility of flat systems of objects, while retaining my urge
> to hierarchically organize some objects sometimes. Everyone packs
> their suitcase different way for different trips, to use Martin's
> metaphor. As long as aggregating and unpacking are reversible nothing
> is lost. Enhancing reversibility is one role of conventions.
>
> "If it's just that on the table, then I'm OK with this."
>
> Let me offer a concrete example of what's on the table for scope:
>
> An hierarchical file "cmip5.cdl" (included below) contains top-level
> groups "cesm", "ecmwf", and "giss" with RCP 8.5 simulations.
> Each group would naturally contain its model-specific metadata as
> group attributes. Yet all three groups are from the RCP 8.5 scenario.
> Should a future version of CF discuss/adopt conventions such that the
> RCP 8.5 metadata (e.g., "Scenario" below) could be placed once as
> global metadata at the root level of cmip5.cdl where it would
> therefore apply (through inheritance) to each top-level group?
>
> An equivalent example would be where the hierarchical file contains
> as groups an ensemble of realizations of a specific model. In that
> case, would it be OK to put the model-generic metadata at the root
> level such that each ensemble-realization-group would inherit it?
> If group attribute inheritance is not supported then many hierarchical
> files will have a lot of duplicated metadata (including "Conventions"
> and "history" attributes).
>
> CDL example discussed is appended below.
> Please excuse the lack of CF annotations/compliance :)
> Wanted to keep it short and oriented toward issue at hand.
>
> Best,
> cz
>
> // ncgen -k netCDF-4 -b -o ~/nco/data/cmip5.nc ~/nco/data/cmip5.cdl
>
> netcdf cmip5 {
> :Conventions = "CF-2.x";
> :history = "yada yada yada";
> :Scenario = "RCP 8.5";
>
> group: cesm {
> dimensions:
> time=unlimited;
> variables:
> float tas(time);
> :Model = "CESM";
> data:
> tas=272,272,272,272;
> } // end cesm
>
> group: ecmwf {
> dimensions:
> time=unlimited;
> variables:
> float tas(time);
> :Model = "ECMWF";
> data:
> tas=273,273,273,273;
> } // end ecmwf
>
> group: giss {
> dimensions:
> time=unlimited;
> variables:
> float tas(time);
> :Model = "GISS";
> data:
> tas=274,274,274,274;
> } // end gfdl
>
> } // end root group
>
>
> Le 17/09/2013 02:10, Bryan Lawrence a ?crit :
> > Hi Folks
> >
> > CMIP5 is illuminating in a number of ways ... not least because it is
> > impossible to come up with a *natural* hierarchy for consumers of the
> > data (as opposed to the producers). But even the producers have
> > different ways of organising their material (running members of
> > different ensembles all at once, or all members of one ensemble at
> > once), then the data has to be published and versioned ... and all of a
> > sudden there is no natural hierarchy for CMIP5 (although everyone will
> > have their own idea of what it could be ... )
> >
> > The advantage of a flat system of objects, which can be linked into
> > multiple hierarchies by a layer of metadata/indirection (call it what
> > you like) becomes obvious in that context ... you can do faceted browse
> > (and faceted assemblage of groups). So it's not so obvious to me that
> > Charlie's examples are so compelling ... (indeed, even the NASA examples
> > aren't so compelling when you consider some of the data use, which
> > immediately requires us to extract and replicate the data into smaller
> > granules in some cases ...)
> >
> > Which leads me naturally onto CF. I think there *is* a case for thinking
> > about how we use hierarchical attributes in CF (indeed, we've just been
> > arguing about it in another context with the concept of file attributes
> > and variable attributes). We could resolve this once and for all by
> > establishing a convention for CF which says how we *will* do group
> > attributes as they become necessary. (I still think we will eventually
> > want vector concepts more naturally represented in files, even though I
> > think files should not be our one view of the world.)
> >
> > However, the argument about file and field attributes applies here. What
> > (I think) we're talking about (thus far) for groups is metadata
> > aggregation and is simply a *file based convention* for simplifying
> > storage, so that when the file gets unpacked, the data model says the
> > attributes are owned by each individual group member. If it's just that
> > on the table, then I'm OK with this.
> >
> > The scope issue on the other hand, opens a can of worms, and I hope I've
> > demonstrated with the CMIP5 preamble, that' it wont be that obvious to
> > resolve.
> >
> > Bryan
>
> --
> Charlie Zender, Earth System Sci. & Computer Sci.
> University of California, Irvine 949-891-2429 )'(
>



-- 
Bryan Lawrence
University of Reading: Professor of Weather and Climate Computing.
National Centre for Atmospheric Science: Director of Models and Data.
STFC: Director of the Centre for Environmental Data Archival.
Ph: +44 118 3786507 or 1235 445012; Web:home.badc.rl.ac.uk/lawrence
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20130918/c1911e68/attachment-0001.html>
Received on Wed Sep 18 2013 - 06:57:44 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒