⇐ ⇒

[CF-metadata] Towards recognizing and exploiting hierarchical groups (Charlie Zender - Steve Hankin - Richard Signell)

From: Charlie Zender <zender>
Date: Tue, 17 Sep 2013 13:49:48 -0700

Hello Bryan,

Thanks for your perspective/counterexample about whether CMIP5 is in
fact a natural candidate for hierarchies. I agree with your points
about the utility of flat systems of objects, while retaining my urge
to hierarchically organize some objects sometimes. Everyone packs
their suitcase different way for different trips, to use Martin's
metaphor. As long as aggregating and unpacking are reversible nothing
is lost. Enhancing reversibility is one role of conventions.

"If it's just that on the table, then I'm OK with this."

Let me offer a concrete example of what's on the table for scope:

An hierarchical file "cmip5.cdl" (included below) contains top-level
groups "cesm", "ecmwf", and "giss" with RCP 8.5 simulations.
Each group would naturally contain its model-specific metadata as
group attributes. Yet all three groups are from the RCP 8.5 scenario.
Should a future version of CF discuss/adopt conventions such that the
RCP 8.5 metadata (e.g., "Scenario" below) could be placed once as
global metadata at the root level of cmip5.cdl where it would
therefore apply (through inheritance) to each top-level group?

An equivalent example would be where the hierarchical file contains
as groups an ensemble of realizations of a specific model. In that
case, would it be OK to put the model-generic metadata at the root
level such that each ensemble-realization-group would inherit it?
If group attribute inheritance is not supported then many hierarchical
files will have a lot of duplicated metadata (including "Conventions"
and "history" attributes).

CDL example discussed is appended below.
Please excuse the lack of CF annotations/compliance :)
Wanted to keep it short and oriented toward issue at hand.

Best,
cz

// ncgen -k netCDF-4 -b -o ~/nco/data/cmip5.nc ~/nco/data/cmip5.cdl

netcdf cmip5 {
  :Conventions = "CF-2.x";
  :history = "yada yada yada";
  :Scenario = "RCP 8.5";

  group: cesm {
  dimensions:
  time=unlimited;
  variables:
  float tas(time);
  :Model = "CESM";
  data:
  tas=272,272,272,272;
  } // end cesm

  group: ecmwf {
  dimensions:
  time=unlimited;
  variables:
  float tas(time);
  :Model = "ECMWF";
  data:
  tas=273,273,273,273;
  } // end ecmwf

  group: giss {
  dimensions:
  time=unlimited;
  variables:
  float tas(time);
  :Model = "GISS";
  data:
  tas=274,274,274,274;
  } // end gfdl

} // end root group


Le 17/09/2013 02:10, Bryan Lawrence a ?crit :
> Hi Folks
>
> CMIP5 is illuminating in a number of ways ... not least because it is
> impossible to come up with a *natural* hierarchy for consumers of the
> data (as opposed to the producers). But even the producers have
> different ways of organising their material (running members of
> different ensembles all at once, or all members of one ensemble at
> once), then the data has to be published and versioned ... and all of a
> sudden there is no natural hierarchy for CMIP5 (although everyone will
> have their own idea of what it could be ... )
>
> The advantage of a flat system of objects, which can be linked into
> multiple hierarchies by a layer of metadata/indirection (call it what
> you like) becomes obvious in that context ... you can do faceted browse
> (and faceted assemblage of groups). So it's not so obvious to me that
> Charlie's examples are so compelling ... (indeed, even the NASA examples
> aren't so compelling when you consider some of the data use, which
> immediately requires us to extract and replicate the data into smaller
> granules in some cases ...)
>
> Which leads me naturally onto CF. I think there *is* a case for thinking
> about how we use hierarchical attributes in CF (indeed, we've just been
> arguing about it in another context with the concept of file attributes
> and variable attributes). We could resolve this once and for all by
> establishing a convention for CF which says how we *will* do group
> attributes as they become necessary. (I still think we will eventually
> want vector concepts more naturally represented in files, even though I
> think files should not be our one view of the world.)
>
> However, the argument about file and field attributes applies here. What
> (I think) we're talking about (thus far) for groups is metadata
> aggregation and is simply a *file based convention* for simplifying
> storage, so that when the file gets unpacked, the data model says the
> attributes are owned by each individual group member. If it's just that
> on the table, then I'm OK with this.
>
> The scope issue on the other hand, opens a can of worms, and I hope I've
> demonstrated with the CMIP5 preamble, that' it wont be that obvious to
> resolve.
>
> Bryan

-- 
Charlie Zender, Earth System Sci. & Computer Sci.
University of California, Irvine 949-891-2429 )'(
Received on Tue Sep 17 2013 - 14:49:48 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒