+1 Martin. I am bugged (not the technical term) by the conclusions here, which seem to be: Because people design systems badly, I must constrain my own system to accommodate their failures.
The use cases for storing the summary information with the file are: (A) It's faster to access, which in some circumstances affect a user (or the cost of computer cycles), whether due to large files or lots of files. (B) In some circumstance (I don't have a netCDF file mangler app sitting in hand), it's the only reasonable way to access.
If someone is writing a subsetting or aggregating utility, and that utility is blindly copying over every metadata item it sees, then a whole lot of metadata is going to be wrong. (Publisher, Provenance, Last Updated, Time and/or Geospatial Range, Min/Max Values, LIcensing Permission, to name a few) This metadata isn't fragile, it's a function of the content. The person who writes the transform utility must either create all new metadata, or to understand the kind of metadata they are copying over and make any necessary changes.
John
On May 23, 2013, at 08:10, "Schultz, Martin" <m.schultz at fz-juelich.de> wrote:
>>> ... but computing min & max on the fly can also be very expensive.
>>> We have aggregated model output datasets where each variable is more
>>> than 1TB!
>
>> Sure, I can see that that's useful metadata about the dataset, and that
>> there's value in caching it somewhere. I just don't think it belongs with
>> the metadata inside the netcdf file. What's the use case for storing it
>> there?
>
> Dear all,
>
> that may be an issue of "style", or more technically speaking the way you set-up your system(s). I do think there is use for this as soon as you take a file out of an interoperable context. However, it's a very good and valid point to say that this information can (very) easily get corrupted. Thus it may be good to define some way of marking "fragile" metadata (i.e. metadata that can be corrupted by slicing or aggregating data from a file -- maybe even from several files). In fact this is related to the issue of tracking metadata information in the data model -- that has been brought up in the track ticket but was referred to the implementation...
>
> Cheers,
>
> Martin
>
>
>
>
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
---------------
John Graybeal
Marine Metadata Interoperability Project:
http://marinemetadata.org
graybeal at marinemetadata.org
Received on Thu May 23 2013 - 10:00:59 BST