⇐ ⇒

[CF-metadata] code that does semantic checking of CF headers

From: Gaffney, Sean P. <sgaf>
Date: Fri, 20 Apr 2012 11:54:03 +0100

Hi everyone, Thanks for all your feedback.

It's made things clearer for me now. Just to summarise then, a suite of attributes that would provide a precise numeric description of the data within a variable could then be (ignoring things such as scaling factors and offsets)

Actual_min, actual_max or actual_range: which describe the actual variable values held in the file, their minimum and maximum (e.g. for sea_water_salinity, I could have data with actual_min of 25.732, actual_max of 34.994)
Valid_min, valid_max and valid_range: these detail the feasible range of data, so for salinity they could range from 0 to 41.5 (if following the BODC parameter vocabulary)
_FillValue which then presets the value that should be used for absent data - in the case of salinity, if keeping to the BODC parameter vocabulary, a value of -1.

I would then need some sort of code to query the data and cross-check it against the values for the attributes. This would be straightforward to write (although possibly computationally expensive as it would have to interrogate the actual data in each file being checked) for actual_min and actual_max, and _FillValue. The code would have to become complex to handle checks for valid_min or valid_max though as these are hypothetical values.

Have I misunderstood anything or do you all feel that is a fair summary of the information I've been given?

Cheers

Sean

-----Original Message-----
From: cf-metadata-bounces at cgd.ucar.edu [mailto:cf-metadata-bounces at cgd.ucar.edu] On Behalf Of John Caron
Sent: 19 April 2012 21:25
To: cf-metadata at cgd.ucar.edu
Subject: Re: [CF-metadata] code that does semantic checking of CF headers

On 4/19/2012 2:11 PM, Dave Allured wrote:
> Sean,
>
> I run into this frequently, especially with files that do not come
> from carefully crafted official archives. I regard all flavors of
> range attributes as frequently unreliable. I think best practice is
> to extract ranges directly from the coordinate values when plotting
> data on the fly, and pay no attention to incorrect secondary metadata.
> This should be simple to code and have no performance penalty.
>
> --Dave

In this case, its the data values that have to be read, in order to know
what the data range is, which may be expensive.
As long as they are hints that can be modified by the user, seems safe
to use.

OTOH, valid_min, valid_max are quite dangerous if they are not correct,
as seth found out. Generic software really has no choice but to respect
them. I recommend leaving them off if you dont actually need them.
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.
Received on Fri Apr 20 2012 - 04:54:03 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒