⇐ ⇒

[CF-metadata] Choice of fill value for unpacked data

From: Russ Rew <russ>
Date: Wed, 26 Sep 2012 15:27:14 -0600

Phil,

> Thanks for providing this reference, which I had previously come across
> as part of some earlier investigative work around data packing. I may be
> misunderstanding some of what's written there, but I wasn't able to see
> anything in that particular section which described how to specify (or
> derive) a fill value to use with the *unpacked* data array. But it's
> quite likely I'm just missing some key point :-)

You're right, that question is not directly addressed. However, I think
just applying the standard unpacking formula provides a natural fill
value for use with the unpacked data array, but see below.

> The CF spec currently states that "values that are identified as missing
> should not be transformed". So if we have a packed data variable of type
> short which uses a fill value of, say, -32768, all we can do is detect
> those values, avoid applying the unpacking algorithm, and flag them as
> missing in the in-memory data array.

I missed the CF prohibition on *unpacking* values identified as missing,
and don't understand or agree with the reason given for that
prohibition. So if my understanding is correct, I would propose to
eliminate that prohibition. However, I think a prohibition on *packing*
values identified as missing by just applying the packing formula to
default fill values is justified for the reasons given.

Specifically, CF currently treats these the same by using the term
"transform" to mean either "pack" or "unpack", saying:

  ... Applications that process variables that have attributes to
  indicate both a transformation (via a scale and/or offset) and missing
  values should first check that a data value is valid, and then apply
  the transformation. Note that values that are identified as missing
  should not be transformed. Since the missing value is outside the
  valid range it is possible that applying a transformation to it could
  result in an invalid operation. For example, the default _FillValue is
  very close to the maximum representable value of IEEE single precision
  floats, and multiplying it by 100 produces an "Infinity" (using single
  precision arithmetic).

But unpacking is accomplished with a simple linear transformation on a
small integer domain (either 8-bit or 16-bit), and using the best-
practices recommendation to employ one of the end-points of this domain
for the special value results in a corresponding unpacked value just
outside of the range of the valid unpacked values. The only way this
could be close to the edges of IEEE floats is if the original unpacked
data also has values close to the corresponding edge of IEEE floats.
That case is sufficiently unlikely for physically meaningful data that
it shouldn't drive conventions.

As an example, consider packing 32-bit floating-point data into 16-bit
shorts, reserving (-32768) at the bottom of the packed range for
_FillValue. Then according to the recommended formulas

  scale_factor = (dataMax - dataMin) / 65534
  add_offset = (dataMax + dataMin) / 2

and the unpacked value corresponding to the packed _FillValue, which I'm
asserting would be a natural unpacked _FillValue in this case, is given
by

  unpacked_data_value = packed_data_value * scale_factor + add_offset

which works out to

  (-32768)*(dataMax - dataMin) / 65534 + (dataMax + dataMin) / 2
  = 1.000015259254738 * dataMin - 1.5259254737998162e-05 * dataMax

For this value to be close to the IEEE float max or min, either the
dataMax or dataMin would also have to be relatively close to those
extreme numbers.

> Now, when we come to write out the unpacked data array to a new netCDF
> file we need to choose a fill value appropriate to the type and range of
> the unpacked data. We could use some semi-arbitrary value, such as
> -1.0e20, or the netCDF default fill value. But as I hinted at in my
> original email, both of those choices might be inappropriate, hence my
> suggestion to explicitly define what the fill value should be via a
> suitable new attribute.

I agree that picking a semi-arbitrary value like -1.0e20 would be
inappropriate, but I think the "natural fill value" computed by the same
unpacking formula used for other packed values would work.

I believe this is the policy used by the NCO program "ncpdq" when
unpacking packed data and copying the result. But as Charlie Zender
points out in the NCO User's Guide:

   ... the interaction of packing and data equal to the _FillValue is
   complex. Test the _FillValue behavior by performing a pack/unpack
   cycle to ensure data that are missing stay missing and data that are
   not missing do not join the Air National Guard and go missing. This
   may lead you to elect a new _FillValue.

--Russ

> Refining my earlier suggestion, that attribute could also be global, in
> which case it would be used for all unpacked variables which didn't
> define the new fill value via a variable-scope attribute of the same
> name. All this would be optional of course.
>
> Regards,
> Phil
>
> > -----Original Message-----
> > From: Russ Rew [mailto:russ at unidata.ucar.edu]
> > Sent: 26 September 2012 04:08
> > To: Bentley, Philip
> > Cc: cf-metadata at cgd.ucar.edu
> > Subject: Re: [CF-metadata] Choice of fill value for unpacked data
> >
> > Hi Phil,
> >
> > > The final para of section 2.5.1 of the CF conventions document
> > > describes the use of the _FillValue (or missing_value) attribute in
> > > the case of data packed using the scale-and-offset method. What is
> > > not clear - at least to me - is what the preferred application
> > > behaviour should be in the case where the data is unpacked and then
> > > written out to a new netCDF file. In particular, what fill value
> > > should be used for the unpacked data variable?
> > >
> > > [snip]
> >
> > A more detailed recommendation for treating special values
> > such as _FillValue or missing_value for packed data is
> > described, with associated formulas, in the "Packed Data
> > Values" section of a best-practices document that we wrote a
> > few years ago:
> >
> >
> > http://www.unidata.ucar.edu/netcdf/docs/BestPractices.html#Pac
> > ked%20Data%20Values
> >
> > It provides a recommendation for ensuring the unpacked
> > special value is not in the range of other unpacked data
> > values. If that recommendation is followed, I think there is
> > no need for an additional _FillValueUnpacked (or
> > missing_value_unpacked) attribute.
> >
> > If you agree that is an acceptable approach, perhaps we
> > should add it to CF ...
> >
> > --Russ
> >
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Wed Sep 26 2012 - 15:27:14 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒