[CF-metadata] How to pack data using scale_factor and add_offset? from John Stark on 2007-12-11 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: John Stark <john.stark>
Date: Tue, 11 Dec 2007 15:26:02 +0000

Hello,

When scale_factor + add_offset is used how should the data be 'rounded'
to integers?

I had a look at the CF conventions and NUG but couldn't find the
a definitive answer.

I had implicitly assumed that 'nearest integer' [NINT] is assumed (i.e.
to best represent the unpacked data), but this is not what the NetCDF
library would do if you passed floating point data. It would do
the machine default (truncation or INT) I think. Other options include
FLOOR / CEIL.

For example, I have temperature, sst, which I want to pack into value iv
using scale_factor=0.01, add_offset=273.15 .

Method 1) [ What the netcdf library would do? ]
iv = INT( ( sst - add_offset ) / scale_factor )

Method 2) [ NINT instead of truncation ]
iv = NINT( (sst - add_offset) / scale_factor )

Method 3) [ To reduce the number of divisions]
iv = NINT( sst * (1./scale_factor) - (add_offset/scale_factor) )

Method 4) [Using FLOOR with offset 0.5 instead of NINT]
iv = FLOOR( sst * (1./scale_factor) - (add_offset/scale_factor) +0.5)

I prefer the use of 'FLOOR(X + 0.5)' since the rounding of the (integer
+half) values is always in the same direction, and avoids 'double-
counting' issues near zero. This is since (zero + add_offset) may have
no special meaning in the unpacked data.

If the sst is not close to the interval edges, then we get the same for
all methods except truncation. (Each line represents a packing method,
see code at end of message). e.g. :
sst=[-3,-2,-1,0,1,2,3] * 0.01d0 -0.0025d0 + add_offset
1) -3 -2 -1 0 0 1 2
2) -3 -2 -1 0 1 2 3
3) -3 -2 -1 0 1 2 3
4) -3 -2 -1 0 1 2 3

If sst is very close to the intervals we get some numerical artifacts,
the mathematically equivalent 2) and 3) differ:
sst=[-3,-2,-1,0,1,2,3] * 0.01d0 -0.005d0 + add_offset
1) -3 -2 -1 0 0 1 2
2) -4 -2 -1 0 0 1 2
3) -4 -2 -1 0 1 2 3
4) -3 -2 -1 0 1 2 3

The problem is that if the same data is repeatedly read, unpacked,
modified and then re-packed, these small differences can get amplified
(especially using truncation). Can we specify _exactly_ how the packing
should be done in CF?

John

[PV-Wave / IDL code]
add_offset = 273.15d0
scale_factor = 0.01d0
iv1 = FIX( ( sst - add_offset ) / scale_factor )
iv2 = NINT( (sst - add_offset) / scale_factor )
iv3 = NINT( sst * (1.d0/scale_factor) - (add_offset/scale_factor) )
iv4 = FLOOR( sst * (1.d0/scale_factor) - (add_offset/scale_factor)
+0.5d0)
print,iv1,iv2,iv3,iv4,Format='(7(i8))'

-- 
John Stark                            SST and Sea Ice Scientist
Met Office, Fitzroy Road, Exeter, Devon, EX1 3PB United Kingdom
Telephone: +44 (0)1392 885059           Fax: +44 (0)1392 885681
E-mail: john.stark at metoffice.gov.uk http://www.metoffice.gov.uk

Received on Tue Dec 11 2007 - 08:26:02 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST