⇐ ⇒

[CF-metadata] Interpretation of Compression by Gathering method

From: Steve Hankin <Steven.C.Hankin>
Date: Wed, 01 Jun 2011 15:17:57 -0000

Hi Philip,

Compression by Gathering as defined in the CF document is not widely
supported in software. You should anticipate serious interoperability
barriers if you choose to use it. In view of the option to use
"chunked" netCDF4 compression transparently with the netCDF3 classic
API, there's a strong case that the old Compression by Gathering should
be formally deprecated.

     - Steve

===========================================

On 6/1/2011 6:47 AM, Bentley, Philip wrote:
>
> Hi folks,
>
> In investigating possible ways of efficiently storing data variables
> containing large numbers of contiguous missing values within
> CF-netCDF3 files, I have been looking at chapter 8 in the CF
> conventions document, which describes the technique of compression by
> gathering. (Of course, effective compression could readily be achieved
> externally using, say, gzip. But for this discussion I'm interested
> in ways to do this solely using netCDF-3.)
>
> Whilst the compression by gathering method doesn't appear to be
> designed expressly for the purpose of circumventing the need to store
> long (time) series of missing data values, it does, on the face of it,
> seem like it could be subverted for just that purpose. However, for
> this to work it has to be assumed that, upon expansion of the
> dimensions specified in the 'compress' attribute of the mask array
> variable (e.g. the landpoint variable of example 8.1), all non-masked
> elements are assigned the missing/fill value within the in-memory data
> array.
>
> The final sentence of chapter 8, in referring to example 8.2, reads:
> "This information implies that the salinity field should be
> uncompressed to an array with dimensions (depth, lat, lon)".
>
> I wonder if that should actually read "(time, depth, lat, lon)"?
> Regardless, the implication seems to be that missing elements in the
> uncompressed array should take on the missing/fill value. Is that the
> correct interpretation? If so, can we further assume that either the
> _FillValue or the missing_value attribute becomes mandatory for
> compressed variables? (Or perhaps netCDF's default fill value would be
> used instead?)
>
> To add a final twist, of the four netCDF clients (ncview, ncbrowse,
> panoply, in house app) I have tried against a variable compressed in
> this fashion, none uncompresses the variable as expected according to
> the wording of the CF convention - though some of this may well be due
> to simple pilot error! In all cases the variable is treated, if at
> all, merely as a regular 2D variable - in my case (time, goodpoints).
> Is anyone aware of any netCDF clients out there which can handle
> compressed variables correctly?
>
> Regards,
> Phil
>
> PS: Clearly the general requirement described above can be met at
> netCDF-4 through use of the per-variable compression options available
> with that version.
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20110601/7c0faee4/attachment-0001.html>
Received on Wed Jun 01 2011 - 09:17:57 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒