⇐ ⇒

[CF-metadata] How to pack data using scale_factor and add_offset?

From: Karl Taylor <taylor13>
Date: Tue, 11 Dec 2007 10:06:09 -0800

Dear John,

CF does not say anything about the precision (or accuracy) of data
stored... only how to store it. Thus, at this time it is entirely up to
the user to decide what to do. If I were you, I would round off to the
nearest integer, rather than truncate, but I'm not sure it is necessary
to mandate this procedure.

cheers,
Karl

John Stark wrote:
> Hello,
>
> When scale_factor + add_offset is used how should the data be 'rounded'
> to integers?
>
> I had a look at the CF conventions and NUG but couldn't find the
> a definitive answer.
>
> I had implicitly assumed that 'nearest integer' [NINT] is assumed (i.e.
> to best represent the unpacked data), but this is not what the NetCDF
> library would do if you passed floating point data. It would do
> the machine default (truncation or INT) I think. Other options include
> FLOOR / CEIL.
>
> For example, I have temperature, sst, which I want to pack into value iv
> using scale_factor=0.01, add_offset=273.15 .
>
> Method 1) [ What the netcdf library would do? ]
> iv = INT( ( sst - add_offset ) / scale_factor )
>
> Method 2) [ NINT instead of truncation ]
> iv = NINT( (sst - add_offset) / scale_factor )
>
> Method 3) [ To reduce the number of divisions]
> iv = NINT( sst * (1./scale_factor) - (add_offset/scale_factor) )
>
> Method 4) [Using FLOOR with offset 0.5 instead of NINT]
> iv = FLOOR( sst * (1./scale_factor) - (add_offset/scale_factor) +0.5)
>
> I prefer the use of 'FLOOR(X + 0.5)' since the rounding of the (integer
> +half) values is always in the same direction, and avoids 'double-
> counting' issues near zero. This is since (zero + add_offset) may have
> no special meaning in the unpacked data.
>
> If the sst is not close to the interval edges, then we get the same for
> all methods except truncation. (Each line represents a packing method,
> see code at end of message). e.g. :
> sst=[-3,-2,-1,0,1,2,3] * 0.01d0 -0.0025d0 + add_offset
> 1) -3 -2 -1 0 0 1 2
> 2) -3 -2 -1 0 1 2 3
> 3) -3 -2 -1 0 1 2 3
> 4) -3 -2 -1 0 1 2 3
>
> If sst is very close to the intervals we get some numerical artifacts,
> the mathematically equivalent 2) and 3) differ:
> sst=[-3,-2,-1,0,1,2,3] * 0.01d0 -0.005d0 + add_offset
> 1) -3 -2 -1 0 0 1 2
> 2) -4 -2 -1 0 0 1 2
> 3) -4 -2 -1 0 1 2 3
> 4) -3 -2 -1 0 1 2 3
>
> The problem is that if the same data is repeatedly read, unpacked,
> modified and then re-packed, these small differences can get amplified
> (especially using truncation). Can we specify _exactly_ how the packing
> should be done in CF?
>
> John
>
> [PV-Wave / IDL code]
> add_offset = 273.15d0
> scale_factor = 0.01d0
> iv1 = FIX( ( sst - add_offset ) / scale_factor )
> iv2 = NINT( (sst - add_offset) / scale_factor )
> iv3 = NINT( sst * (1.d0/scale_factor) - (add_offset/scale_factor) )
> iv4 = FLOOR( sst * (1.d0/scale_factor) - (add_offset/scale_factor)
> +0.5d0)
> print,iv1,iv2,iv3,iv4,Format='(7(i8))'
>
Received on Tue Dec 11 2007 - 11:06:09 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒