Hi Jim,
I think partly it's a problem that mechanisms for aggregation are not standardized, or understood by tool users. This can lead to odd behaviour when some variables are aggregated. Maybe the tools are badly coded in some cases, or maybe there isn't an obvious correct behaviour for the tool to follow, but in either case that's not the fault of the data creator. Some guidance might help to flag the unintended consequences of certain actions, of which the data creators may not be aware.
I agree it's not a flaw in the valid_min/max concept per se, but this shouldn't stop us from providing "best practice" guidance.
Best wishes,
Jon
From: Jim Biard [mailto:jim.biard at noaa.gov]
Sent: 09 July 2013 14:54
To: cf-metadata at cgd.ucar.edu List
Cc: Jon Blower
Subject: Re: [CF-metadata] valid_min and valid_max considered harmful?
Jon,
I appreciate the frustration of finding such problems, but isn't this more a problem of lazy processing than a flaw in the valid min/max concept?
Grace and peace,
Jim
Jim Biard
Research Scholar
Cooperative Institute for Climate and Satellites<
http://www.cicsnc.org/>
Remote Sensing and Applications Division
National Climatic Data Center<
http://www.ncdc.noaa.gov/>
151 Patton Ave, Asheville, NC 28801-5001
jim.biard at noaa.gov<mailto:jim.biard at noaa.gov>
828-271-4900
[cid:image001.png at 01CE7CC6.65DE2830]
Follow us on Facebook<
https://www.facebook.com/cicsnc>!
On Jul 9, 2013, at 8:42 AM, Jon Blower <j.d.blower at reading.ac.uk<mailto:j.d.blower at reading.ac.uk>> wrote:
Hi all,
On very numerous occasions, I have found problems with datasets where the valid_min and valid_max attributes are not set correctly, either because the original data files are wrong, or because some processing chain or aggregation machinery has resulted in incorrect values. This is a particular problem in time coordinate arrays.
In my experience, these occasions have outweighed the number of times when these attributes are actually useful - in most cases the user only has one missing value and this should be recorded as a _FillValue, as in section 2.5.1 of the CF documentation, or does not have a missing value at all.
I think this happens because data producers (with good intentions) feel obliged to populate their NetCDF files with as much metadata as possible and end up specifying some attributes that don't provide much value for their data. Is it worth adding some text to the CF docs to say something along the lines of:
"The attributes valid_min, valid_max and valid_range should only be used when necessary [or should be used with caution], as they can cause unexpected behaviour in situations such as aggregation. If only one missing value is needed for a variable then we recommend strongly that this value be specified using the _FillValue attribute. "
The second sentence is already present in the standard. We may need to define what "when necessary" means...
Cheers,
Jon
--
Dr Jon Blower
Technical Director, Reading e-Science Centre
School of Mathematical and Physical Sciences
University of Reading, UK
Tel: +44 (0)118 378 5213
Mob: +44 (0)7919 112687
http://www.resc.reading.ac.uk
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20130709/e2923ba3/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 15784 bytes
Desc: image001.png
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20130709/e2923ba3/attachment-0001.png>
Received on Tue Jul 09 2013 - 10:04:39 BST