[CF-metadata] Are ensembles a compelling use case for "group-aware" metadata? (CZ) from Bryan Lawrence on 2013-09-20 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Bryan Lawrence <bryan.lawrence>
Date: Fri, 20 Sep 2013 08:22:41 +0100

... I should have read this first. Many of the same points!
Bryan

On 19 September 2013 22:58, <m.schultz at fz-juelich.de> wrote:

> Hi Charlie,
>
> very good and extensive explanation of the potential use for groups
> and group-aware metadata. Yet, I have a few remarks (which may in part
> reveal that I should probably read the preamble of the CF convention again
> ;-):
>
> > Point 1: How does the user know she has all the realizations?
>
> Is this question best addressed with metadata in a (series of) file(s)? In
> a modern, interoperable architecture, I would think that this belongs into
> the realm of data discovery, which would be done via web catalogues using
> metadata facets. File-based metadata IMHO may be more prone to failure.
> Just imagine, ECMWF had first generated two ensemble members, and their
> metadata would say so (your "page 2/2" analogy). Now they run another two:
> do you really expect the metadata from the old files to be updated? A web
> catalogue would provide a more robust solution to this question, I believe.
>
> This doesn't mean that it may not be useful to have such information in a
> file! However, to come back to the suitcases: this can only be a packing
> list for the current trip and not an inventory of all the socks you may
> possibly own. Of course your young aspiring researcher may wish to express
> her knowledge about other ensemble members she found on the web but didn't
> include in the file (the suitcase). But will her supervisor or colleague on
> the other side of the world understand what she is talking about? I think,
> if you intend to go beyond the packing list, you open too many cans with
> too many worms.
>
> > Point 2: Multiple and/or Non-numeric Ensemble axes
>
> Here, you have a valid point, although - again - I would not connect this
> to knowing " that she has all the models". Yet, within the packed file (the
> suitcase) you want to know which hierarchy model (packing order) was
> applied in order to be able to aggregate things (for example by computing
> ensemble averages). See also my use-case on aircraft data introduced below.
> Question: what happens to this kind of information when the files are
> flattened and re-packaged? It might well become meaningless, which would
> indicate that these are "temporary" metadata, and thus probably out of
> scope for CF. This actually reminds me a bit of my experiences with the
> history attribute when I use ncks -A. This command will preserve the
> history of one file, but discard the history of the other file, which is
> certainly not the behavior you would like to see in ungrouping/re-grouping
> software.
>
> > Point 3: Weights and intentional reproducibility of MME statistics
>
> In my view this is actually just another viewing angle on your point #2.
>
> --
>
> Your use-case does however highlight the "convenience" of grouping data
> which somehow belong to each other into one file. In a world of flat files,
> one must check coordinates each time when you want to perform some sort of
> (ensemble) averaging operation. A hierarchical file will tell you that it
> is OK to average by placing the common coordinates on the upper level.
> IMPORTANT: again, this doesn't mean that this is the only or best way to do
> the grouping - yet, it seems a compelling advantage to have this
> coordinate-consistency problem eliminated somewhere along your processing
> steps. As others said already: there are reasons for why people use
> suitcases.
>
> --
>
> Now, here is another use case, which we haven't implemented yet - partly
> because we didn't see how it can be done in a CF consistent way:
> While there has been a definition of a standard file layout for data from
> multiple stations (a contribution from Ben Domenico and Stefano Nativi if I
> am not mistaken), this concept cannot be applied to multiple aircraft
> flight data. The station data can be packaged together with help of a
> non-geophysical "station" coordinate, because all stations share the same
> time axis. With aircraft flights, the time axes often don't overlap, and
> forcing all data onto the superset of time would be a tremendous waste of
> space. Groups would seem as the natural solution to this problem! Why not
> flat files? Because you might wish to retrieve all the aircraft data which
> were sampled in a given region during a specific period (a natural use case
> for a catalogue query it seems) in one entity, and not in N entities, where
> you cannot even predict N.
>
> I would think the same applies to "granules" of satellite data which share
> a common calibration, for example.
>
> --
>
> As Nan said, we should try to come back to define what is really at stake
> for CF and what exactly shall be proposed. Now this is where my failure to
> re-read the convention preamble may show ;-). The main question is: is CF
> about files or about interoperability? Unfortunately, my view on this is
> not entirely clear, because it seems to be a bit of both. The
> standard_names clearly have a bearing in the interoperable world, and this
> shows through various links to the CF standard_names in web catalogues or
> controlled vocabulary collections (e.g. SeaDataNet). The conventions
> themselves seem to be more file-oriented - even though the discussions
> about the data model always make a strong point to go beyond representation
> in a (single) file. [If someone disagrees and wishes to see the CF
> convention play a more important role in interoperability, then I would ask
> why it is not cast into an XML schema extending ISO19115 then. ] If CF is
> indeed "file-oriented", then I do think that it makes a lot of sense to
> support "modern" file structures, which include groups and hierarchies,
> whether we like them or not. Therefore, I would advocate that we focus the
> discussion on two major points with a couple of sub-issues:
>
> 1. which parts of CF might fail when we have a hierarchical file? (and
> let's stick to the simple inheritance model of netcdf4 for now!)
> 1a. what would the current CF checker say if it is fed a hierarchical file?
> 1b. what happens to global attributes when flat files are grouped together?
> 1c. do we need to re-phrase some aspects of the convention to make them
> "group-aware"? (this does not include defining new rules - that is covered
> in point 2)
> 1d. anything else?
>
> 2. where do we need to extend the current CF concept?
> 2a. introduction of a new attribute "level" (equate "global" with "root"?
> What happens when hierarchical files are flattened? [please see the 3
> varieties of flattening operations mentioned in an earlier post])
> 2b. specification of "ensemble_..." attributes? "ensemble_axis" may not be
> needed of these axes are defined on the group level (?) Something like
> "ensemble_history" or "ensemble_structure" to inform the user about the
> grouping principle?
> 2c. what other "relations" need to be expressed within a hierarchical
> file? The guiding principle here should be that additional rules are only
> needed if they avoid ambiguity and misinterpretation of the data. And here
> we get onto interoperability territory again (see my use case about
> aircraft data above).
>
>
> Sorry for this long post -- this just somehow seems to be quite relevant!
>
> Best regards,
>
> Martin
>
>
>
> --------------------------------------------------------------------------------
> PD Dr. Martin G. Schultz
> IEK-8, Forschungszentrum J?lich
> D-52425 J?lich
> Ph: +49 2461 61 2831
>
>
>
>
>
> ------------------------------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
>
> ------------------------------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------------------------
>
> Das Forschungszentrum oeffnet seine Tueren am Sonntag, 29. September, von
> 10:00 bis 17:00 Uhr: http://www.tagderneugier.de
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> --
> Scanned by iCritical.
>

-- 
Bryan Lawrence
University of Reading: Professor of Weather and Climate Computing.
National Centre for Atmospheric Science: Director of Models and Data.
STFC: Director of the Centre for Environmental Data Archival.
Ph: +44 118 3786507 or 1235 445012; Web:home.badc.rl.ac.uk/lawrence
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20130920/82bfb6a2/attachment.html>

Received on Fri Sep 20 2013 - 01:22:41 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST