⇐ ⇒

[CF-metadata] use of _FillValue vs valid_range, and minimum and maximum

From: Steve Hankin <steven.c.hankin>
Date: Thu, 23 May 2013 09:45:51 -0700

On 5/23/2013 9:00 AM, John Graybeal wrote:
> +1 Martin. I am bugged (not the technical term) by the conclusions here, which seem to be: Because people design systems badly, I must constrain my own system to accommodate their failures.

Hi John,

The flip side of this argument is even more compelling: _building
features into your interoperability framework that you can see in
advance are going to be misused, is obviously the wrong thing to do._
When incorrect metadata becomes commonplace is does worse damage than
gaps in the metadata; it casts a shadow of doubt over the entire framework.

The thrust of Martin's arguments seems right on to me. Lets look for
solutions that can provide the desired metadata and be robust, too. For
example, THREDDS and ncISO provide us with a level of abstraction above
the attributes found in the physical files. If the min and max values
for a given dataset are stable (a fact probably known to the creator of
the dataset), then by all means encode the values as global attributes.
If they are not stable, then omit them from the files; turn to TDS and
ncISO to create (and cache) these metadata values.

Caching is a key issue here, given the cost of re-computing metadata
such as actual min and max values. A Web Accessible Folder of ISO
metadata *can be* an adequate cache, as long as last-modified dates are
available and are carefully tracked. Making lastModified dates
universally available is arguably one of the key issues in finding a
robust solution to this dilemma. It is on the TDS to-do list, we
understand. (We need to bring HYRAX, PyDAP, etc. into this
conversation, too.)

     - Steve

>
> The use cases for storing the summary information with the file are: (A) It's faster to access, which in some circumstances affect a user (or the cost of computer cycles), whether due to large files or lots of files. (B) In some circumstance (I don't have a netCDF file mangler app sitting in hand), it's the only reasonable way to access.
>
> If someone is writing a subsetting or aggregating utility, and that utility is blindly copying over every metadata item it sees, then a whole lot of metadata is going to be wrong. (Publisher, Provenance, Last Updated, Time and/or Geospatial Range, Min/Max Values, LIcensing Permission, to name a few) This metadata isn't fragile, it's a function of the content. The person who writes the transform utility must either create all new metadata, or to understand the kind of metadata they are copying over and make any necessary changes.
>
> John
>
> On May 23, 2013, at 08:10, "Schultz, Martin" <m.schultz at fz-juelich.de> wrote:
>
>>>> ... but computing min & max on the fly can also be very expensive.
>>>> We have aggregated model output datasets where each variable is more
>>>> than 1TB!
>>> Sure, I can see that that's useful metadata about the dataset, and that
>>> there's value in caching it somewhere. I just don't think it belongs with
>>> the metadata inside the netcdf file. What's the use case for storing it
>>> there?
>> Dear all,
>>
>> that may be an issue of "style", or more technically speaking the way you set-up your system(s). I do think there is use for this as soon as you take a file out of an interoperable context. However, it's a very good and valid point to say that this information can (very) easily get corrupted. Thus it may be good to define some way of marking "fragile" metadata (i.e. metadata that can be corrupted by slicing or aggregating data from a file -- maybe even from several files). In fact this is related to the issue of tracking metadata information in the data model -- that has been brought up in the track ticket but was referred to the implementation...
>>
>> Cheers,
>>
>> Martin
>>
>>
>>
>>
>> ------------------------------------------------------------------------------------------------
>> ------------------------------------------------------------------------------------------------
>> Forschungszentrum Juelich GmbH
>> 52425 Juelich
>> Sitz der Gesellschaft: Juelich
>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
>> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
>> Prof. Dr. Sebastian M. Schmidt
>> ------------------------------------------------------------------------------------------------
>> ------------------------------------------------------------------------------------------------
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>
> ---------------
> John Graybeal
> Marine Metadata Interoperability Project: http://marinemetadata.org
> graybeal at marinemetadata.org
>
>
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20130523/cb1f3e9f/attachment.html>
Received on Thu May 23 2013 - 10:45:51 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒