⇐ ⇒

[CF-metadata] Interpretation of Compression by Gathering method

From: Bentley, Philip <philip.bentley>
Date: Wed, 01 Jun 2011 13:47:37 -0000

Hi folks,

In investigating possible ways of efficiently storing data variables
containing large numbers of contiguous missing values within CF-netCDF3
files, I have been looking at chapter 8 in the CF conventions document,
which describes the technique of compression by gathering. (Of course,
effective compression could readily be achieved externally using, say,
gzip. But for this discussion I'm interested in ways to do this solely
using netCDF-3.)

Whilst the compression by gathering method doesn't appear to be designed
expressly for the purpose of circumventing the need to store long (time)
series of missing data values, it does, on the face of it, seem like it
could be subverted for just that purpose. However, for this to work it
has to be assumed that, upon expansion of the dimensions specified in
the 'compress' attribute of the mask array variable (e.g. the landpoint
variable of example 8.1), all non-masked elements are assigned the
missing/fill value within the in-memory data array.

The final sentence of chapter 8, in referring to example 8.2, reads:
"This information implies that the salinity field should be uncompressed
to an array with dimensions (depth, lat, lon)".

I wonder if that should actually read "(time, depth, lat, lon)"?
Regardless, the implication seems to be that missing elements in the
uncompressed array should take on the missing/fill value. Is that the
correct interpretation? If so, can we further assume that either the
_FillValue or the missing_value attribute becomes mandatory for
compressed variables? (Or perhaps netCDF's default fill value would be
used instead?)

To add a final twist, of the four netCDF clients (ncview, ncbrowse,
panoply, in house app) I have tried against a variable compressed in
this fashion, none uncompresses the variable as expected according to
the wording of the CF convention - though some of this may well be due
to simple pilot error! In all cases the variable is treated, if at all,
merely as a regular 2D variable - in my case (time, goodpoints). Is
anyone aware of any netCDF clients out there which can handle compressed
variables correctly?

Regards,
Phil

PS: Clearly the general requirement described above can be met at
netCDF-4 through use of the per-variable compression options available
with that version.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20110601/736d98a6/attachment-0001.html>
Received on Wed Jun 01 2011 - 07:47:37 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒