⇐ ⇒

[CF-metadata] Towards recognizing and exploiting hierarchical groups (Charlie Zender - Steve Hankin - Richard Signell)

From: Charlie Zender <zender>
Date: Tue, 17 Sep 2013 13:47:35 -0700

Dear Jonathan,

Thanks for your input. I expect many here share your views.

Directories full of flat files are often a sensible way to organize.
Easy to search for and ingest what with ls or grep and options.
What limits such searchability within hierarchical files?
Hierarchical files look just like UNIX directory hierarchies and 'ls'
does fine with those. This is why NCO supports extended regular
expressions on groups and variables (http://nco.sf.net/nco.html#rx).

I am glad people are so familiar with CMIP5 because it gives us a
good common vocabulary to discuss the advantages/disadvantages of
hierarchies. Yes, if one wishes to flatten, alter, or invert a
hierarchical file then that's an extra step and frustration compared
to starting with flat files. No argument from me that a toolkit for
aggregating and disaggregating hierarchical files is indispensable :)

In any case, this discussion is about whether/how to expand choice.
It is an opportunity for CF to engage with and help guide the
development of sensible "group-aware" conventions for those who choose
to provide/use hierarchical files, and who wish to share and receive
the benefits of CF. That way both CF and software tool technology
could advance in tandem to ameliorate the awkwardness and enhance the
benefits of handling hierarchical files.

Thanks for being open to the discussion. Let it continue...

cz

Le 17/09/2013 01:06, Jonathan Gregory a ?crit :
> Dear Charlie
>
> Thank you for your interesting post and the discussion.
>
> As a data analyst, I have a different view from NASA. I dislike hierarchies
> and directories. I prefer things to be as flat as possible, with each item
> thoroughly described by its own independent metadata, using tools to identify
> what I need efficiently by use of that metadata. The reason I prefer that is
> because hierarchies are restrictive. They impose a particular organisation and
> compartmentalisation on the data, which may be an obstacle for some purposes.
> I think directory structures should only be used when it becomes intolerably
> inefficient to keep everything in one directory. Until you reach that point,
> clever searching algorithms can make a single flat structure seem to have
> whatever organisation you want for the purpose you have in hand.
>
>> But IMHO the best reason to make "group-aware" CF is to facilitate
>> more commonsense storage of intrinsically hierarchical datasets, e.g.,
>> CMIP5 scenario->model->ensemble = RCP850->CESM->Historical.
>> Why flatten that?
>
> Because its hierarchy is a nuisance. It makes it easier to compare different
> ensemble members of the same experiment and model, but harder to compare the
> first ensemble member of the same experiment with different models, or the
> first ensemble member of different experiments with the same model, which are
> equally or more frequent needs with CMIP5 analysis. This is an example where
> the organisation of the data is an obstacle to its use. When I download the
> CMIP5 files, I put them in all in one enormous directory, and grep it to find
> the ones I need. That works fast enough.
>
> However, hierarchies are a fact of life, and some people like them. :-) I am
> probably just a bit weird. So CF has to be able to deal with groups. It does
> seem to go against the general tendency in CF, however, which in my opinion
> has been that fields should be self-describing, and their organisation into
> files should be as unimportant as possible. Groups tend to make files more
> important again.
>
> Best wishes
>
> Jonathan
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>

-- 
Charlie Zender, Earth System Sci. & Computer Sci.
University of California, Irvine 949-891-2429 )'(
Received on Tue Sep 17 2013 - 14:47:35 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒