-- Your use-case does however highlight the "convenience" of grouping data which somehow belong to each other into one file. In a world of flat files, one must check coordinates each time when you want to perform some sort of (ensemble) averaging operation. A hierarchical file will tell you that it is OK to average by placing the common coordinates on the upper level. IMPORTANT: again, this doesn't mean that this is the only or best way to do the grouping - yet, it seems a compelling advantage to have this coordinate-consistency problem eliminated somewhere along your processing steps. As others said already: there are reasons for why people use suitcases. -- Now, here is another use case, which we haven't implemented yet - partly because we didn't see how it can be done in a CF consistent way: While there has been a definition of a standard file layout for data from multiple stations (a contribution from Ben Domenico and Stefano Nativi if I am not mistaken), this concept cannot be applied to multiple aircraft flight data. The station data can be packaged together with help of a non-geophysical "station" coordinate, because all stations share the same time axis. With aircraft flights, the time axes often don't overlap, and forcing all data onto the superset of time would be a tremendous waste of space. Groups would seem as the natural solution to this problem! Why not flat files? Because you might wish to retrieve all the aircraft data which were sampled in a given region during a specific period (a natural use case for a catalogue query it seems) in one entity, and not in N entities, where you cannot even predict N. I would think the same applies to "granules" of satellite data which share a common calibration, for example. -- As Nan said, we should try to come back to define what is really at stake for CF and what exactly shall be proposed. Now this is where my failure to re-read the convention preamble may show ;-). The main question is: is CF about files or about interoperability? Unfortunately, my view on this is not entirely clear, because it seems to be a bit of both. The standard_names clearly have a bearing in the interoperable world, and this shows through various links to the CF standard_names in web catalogues or controlled vocabulary collections (e.g. SeaDataNet). The conventions themselves seem to be more file-oriented - even though the discussions about the data model always make a strong point to go beyond representation in a (single) file. [If someone disagrees and wishes to see the CF convention play a more important role in interoperability, then I would ask why it is not cast into an XML schema extending ISO19115 then. ] If CF is indeed "file-oriented", then I do think that it makes a lot of sense to support "modern" file structures, which include groups and hierarchies, whether we like them or not. Therefore, I would advocate that we focus the discussion on two major points with a couple of sub-issues: 1. which parts of CF might fail when we have a hierarchical file? (and let's stick to the simple inheritance model of netcdf4 for now!) 1a. what would the current CF checker say if it is fed a hierarchical file? 1b. what happens to global attributes when flat files are grouped together? 1c. do we need to re-phrase some aspects of the convention to make them "group-aware"? (this does not include defining new rules - that is covered in point 2) 1d. anything else? 2. where do we need to extend the current CF concept? 2a. introduction of a new attribute "level" (equate "global" with "root"? What happens when hierarchical files are flattened? [please see the 3 varieties of flattening operations mentioned in an earlier post]) 2b. specification of "ensemble_..." attributes? "ensemble_axis" may not be needed of these axes are defined on the group level (?) Something like "ensemble_history" or "ensemble_structure" to inform the user about the grouping principle? 2c. what other "relations" need to be expressed within a hierarchical file? The guiding principle here should be that additional rules are only needed if they avoid ambiguity and misinterpretation of the data. And here we get onto interoperability territory again (see my use case about aircraft data above). Sorry for this long post -- this just somehow seems to be quite relevant! Best regards, Martin -------------------------------------------------------------------------------- PD Dr. Martin G. Schultz IEK-8, Forschungszentrum J?lich D-52425 J?lich Ph: +49 2461 61 2831 ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Das Forschungszentrum oeffnet seine Tueren am Sonntag, 29. September, von 10:00 bis 17:00 Uhr: http://www.tagderneugier.deReceived on Thu Sep 19 2013 - 15:58:18 BST
This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST