[CF-metadata] Choice of fill value for unpacked data from Jim Biard on 2012-10-10 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Jim Biard <jim.biard>
Date: Wed, 10 Oct 2012 08:36:48 -0400

John,

I agree that there is no ambiguity in the original dataset. I was just pointing out a scenario in which the difficulty mentioned by Phil in his original post would manifest itself. The methodology referenced by Russ won't handle it. If you (for reasons beyond your control) have a packed fill value that is inside your valid range, then no guidance is provided by CF to automated client software for how to represent the unpacked fill value, leaving the user with a requirement to dig back through the packed values, find the elements marked as fill, and "manually" mark the unpacked values.

Grace and peace,

Jim

Jim Biard
Research Scholar
Cooperative Institute for Climate and Satellites
Remote Sensing and Applications Division
National Climatic Data Center
151 Patton Ave, Asheville, NC 28801-5001

jim.biard at noaa.gov
828-271-4900

On Oct 9, 2012, at 4:23 PM, John Caron <caron at unidata.ucar.edu> wrote:

> Hi Jim:
>
> _FillValue/missing_value refers to the packed value, so theres no ambiguity in the original dataset. It is best to make sure its outside the range of real values, but even if not, one just has to search for that exact bit pattern.
>
> If someone rewrites the data, its their responsibility to choose a _FillValue/missing_value that is unambiguous.
>
> If your use case is important, you could add _UnpackedFillValue, so that your software did the right thing. Dunno if it would be generally useful.
>
> John
>
> On 9/27/2012 7:00 AM, Jim Biard wrote:
>> Hi.
>>
>> Assuming you have the luxury of specifying your _FillValue and/or
>> missing_value, I agree that this isn't a big deal. However, I am
>> working with data where the project has defined fill/missing values that
>> are wholly within the range of possible values (NPP satellite data).
>> The approach defined below fails in such cases.
>>
>> Grace and peace,
>>
>> Jim
>>
>> Jim Biard
>> Research Scholar
>> Cooperative Institute for Climate and Satellites
>> Remote Sensing and Applications Division
>> National Climatic Data Center
>> 151 Patton Ave, Asheville, NC 28801-5001
>>
>> jim.biard at noaa.gov <mailto:jim.biard at noaa.gov>
>> 828-271-4900
>>
>> On Sep 25, 2012, at 11:08 PM, Russ Rew <russ at unidata.ucar.edu
>> <mailto:russ at unidata.ucar.edu>> wrote:
>>
>>> Hi Phil,
>>>
>>>> The final para of section 2.5.1 of the CF conventions document describes
>>>> the use of the _FillValue (or missing_value) attribute in the case of
>>>> data packed using the scale-and-offset method. What is not clear - at
>>>> least to me - is what the preferred application behaviour should be in
>>>> the case where the data is unpacked and then written out to a new netCDF
>>>> file. In particular, what fill value should be used for the unpacked
>>>> data variable?
>>>>
>>>> I presume that one wouldn't normally want to use the original fill value
>>>> since that value (typically an 8- or 16-bit integer) is quite likely to
>>>> fall within the normal range of the unpacked data (typically a 32- or
>>>> 64-bit float).
>>>>
>>>> In the absence of explicitly setting a fill value attribute on the
>>>> unpacked data variable I assume that the netCDF default fill value will
>>>> be used for the data type in question. Which may not always be desirable
>>>> (certainly not for 32-bit floats, where the default fill value can give
>>>> rise to subtle precision-related problems).
>>>>
>>>> With this in mind, I was wondering if there is any merit in defining a
>>>> new attribute called, say, _UnpackedFillValue (or
>>>> unpacked_missing_value)? If client software detected this attribute then
>>>> the associated value (same data type as the scale_factor and add_offset
>>>> attributes) would be used as the fill value for the unpacked data
>>>> variable.
>>>>
>>>> Alternatively, the names _FillValueUnpacked (missing_value_unpacked)
>>>> might be preferable since they would then appear together pair-wise in
>>>> CDL-type listings, e.g.
>>>>
>>>> short pkd_var(z, y, x) :
>>>> ...
>>>> pkd_var:_FillValue =3D -32768 ;
>>>> pkd_var:_FillValueUnpacked =3D -1.0e30 ;
>>>> pkd_var:add_offset =3D 42.0 ;
>>>> pkd_var:scale_factor =3D 1234.0 ;
>>>> ...
>>>>
>>>>
>>>> Any merit/mileage in this idea?
>>>
>>> A more detailed recommendation for treating special values such as
>>> _FillValue or missing_value for packed data is described, with
>>> associated formulas, in the "Packed Data Values" section of a
>>> best-practices document that we wrote a few years ago:
>>>
>>> http://www.unidata.ucar.edu/netcdf/docs/BestPractices.html#Packed%20Data%20Values
>>>
>>> It provides a recommendation for ensuring the unpacked special value is
>>> not in the range of other unpacked data values. If that recommendation
>>> is followed, I think there is no need for an additional
>>> _FillValueUnpacked (or missing_value_unpacked) attribute.
>>>
>>> If you agree that is an acceptable approach, perhaps we should add it to
>>> CF ...
>>>
>>> --Russ
>>> _______________________________________________
>>> CF-metadata mailing list
>>> CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>>
>>
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20121010/48900a60/attachment-0001.html>
Received on Wed Oct 10 2012 - 06:36:48 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST