⇐ ⇒

[CF-metadata] code that does semantic checking of CF headers

From: Jim Biard <jim.biard>
Date: Fri, 20 Apr 2012 08:47:26 -0400

Hi.

As I best understand it, true coordinate variables should not ever have invalid values. (See the recent discussion about auxiliary coordinate variables.)

In the case of observation data, valid_min and valid_max are often directly tied to the finite possible range of values returned by a detector. Values outside of this range represent an error of some sort, which might be an error flag value, a problem in an algorithm, or a corruption in the data feed.

It is no more complicated to check against an actual range than to check against a valid range. The valid range attributes produce automatic behavior in a number of applications (the netCDF Tools java app, for example), but the actual range attributes do not. Similarly, the _FillValue attribute produces automatic behavior. If you read the netCDF documentation details about the _FillValue and valid range attributes, you can get an idea of what happens "under the hood" when they are present.

Grace and peace,

Jim

Jim Biard
Research Scholar
Cooperative Institute for Climate and Satellites
Remote Sensing and Applications Division
National Climatic Data Center
151 Patton Ave, Asheville, NC 28801-5001

jim.biard at noaa.gov
828-271-4900

On Apr 20, 2012, at 6:54 AM, Gaffney, Sean P. wrote:

> Hi everyone, Thanks for all your feedback.
>
> It's made things clearer for me now. Just to summarise then, a suite of attributes that would provide a precise numeric description of the data within a variable could then be (ignoring things such as scaling factors and offsets)
>
> Actual_min, actual_max or actual_range: which describe the actual variable values held in the file, their minimum and maximum (e.g. for sea_water_salinity, I could have data with actual_min of 25.732, actual_max of 34.994)
> Valid_min, valid_max and valid_range: these detail the feasible range of data, so for salinity they could range from 0 to 41.5 (if following the BODC parameter vocabulary)
> _FillValue which then presets the value that should be used for absent data - in the case of salinity, if keeping to the BODC parameter vocabulary, a value of -1.
>
> I would then need some sort of code to query the data and cross-check it against the values for the attributes. This would be straightforward to write (although possibly computationally expensive as it would have to interrogate the actual data in each file being checked) for actual_min and actual_max, and _FillValue. The code would have to become complex to handle checks for valid_min or valid_max though as these are hypothetical values.
>
> Have I misunderstood anything or do you all feel that is a fair summary of the information I've been given?
>
> Cheers
>
> Sean
>
> -----Original Message-----
> From: cf-metadata-bounces at cgd.ucar.edu [mailto:cf-metadata-bounces at cgd.ucar.edu] On Behalf Of John Caron
> Sent: 19 April 2012 21:25
> To: cf-metadata at cgd.ucar.edu
> Subject: Re: [CF-metadata] code that does semantic checking of CF headers
>
> On 4/19/2012 2:11 PM, Dave Allured wrote:
>> Sean,
>>
>> I run into this frequently, especially with files that do not come
>> from carefully crafted official archives. I regard all flavors of
>> range attributes as frequently unreliable. I think best practice is
>> to extract ranges directly from the coordinate values when plotting
>> data on the fly, and pay no attention to incorrect secondary metadata.
>> This should be simple to code and have no performance penalty.
>>
>> --Dave
>
> In this case, its the data values that have to be read, in order to know
> what the data range is, which may be expensive.
> As long as they are hints that can be modified by the user, seems safe
> to use.
>
> OTOH, valid_min, valid_max are quite dangerous if they are not correct,
> as seth found out. Generic software really has no choice but to respect
> them. I recommend leaving them off if you dont actually need them.
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> --
> This message (and any attachments) is for the recipient only. NERC
> is subject to the Freedom of Information Act 2000 and the contents
> of this email and any reply you make may be disclosed by NERC unless
> it is exempt from release under the Act. Any material supplied to
> NERC may be stored in an electronic records management system.
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20120420/06e6c52e/attachment-0001.html>
Received on Fri Apr 20 2012 - 06:47:26 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒