⇐ ⇒

[CF-metadata] Towards recognizing and exploiting hierarchical groups (Charlie Zender - Steve Hankin - Richard Signell)

From: Jonathan Gregory <j.m.gregory>
Date: Tue, 17 Sep 2013 09:06:49 +0100

Dear Charlie

Thank you for your interesting post and the discussion.

As a data analyst, I have a different view from NASA. I dislike hierarchies
and directories. I prefer things to be as flat as possible, with each item
thoroughly described by its own independent metadata, using tools to identify
what I need efficiently by use of that metadata. The reason I prefer that is
because hierarchies are restrictive. They impose a particular organisation and
compartmentalisation on the data, which may be an obstacle for some purposes.
I think directory structures should only be used when it becomes intolerably
inefficient to keep everything in one directory. Until you reach that point,
clever searching algorithms can make a single flat structure seem to have
whatever organisation you want for the purpose you have in hand.

> But IMHO the best reason to make "group-aware" CF is to facilitate
> more commonsense storage of intrinsically hierarchical datasets, e.g.,
> CMIP5 scenario->model->ensemble = RCP850->CESM->Historical.
> Why flatten that?

Because its hierarchy is a nuisance. It makes it easier to compare different
ensemble members of the same experiment and model, but harder to compare the
first ensemble member of the same experiment with different models, or the
first ensemble member of different experiments with the same model, which are
equally or more frequent needs with CMIP5 analysis. This is an example where
the organisation of the data is an obstacle to its use. When I download the
CMIP5 files, I put them in all in one enormous directory, and grep it to find
the ones I need. That works fast enough.

However, hierarchies are a fact of life, and some people like them. :-) I am
probably just a bit weird. So CF has to be able to deal with groups. It does
seem to go against the general tendency in CF, however, which in my opinion
has been that fields should be self-describing, and their organisation into
files should be as unimportant as possible. Groups tend to make files more
important again.

Best wishes

Jonathan
Received on Tue Sep 17 2013 - 02:06:49 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒