⇐ ⇒

[CF-metadata] missing_value vs. _FillValue

From: Brian Eaton <eaton>
Date: Wed, 5 Nov 2003 20:03:36 -0700

Hi Karl,

> I read in our convention that the missing_value is deprecated and should
> no longer be used (use _FillValue instead). What motivated the
> deprecation?

_FillValue and missing_value are attributes that we inherit from the NUG.
What is in the CF document is my best attempt at interpreting what's in the
NUG. Unfortunately that's not easy, and what's currently in the NUG is not
implemented in any generic software that I'm aware of (except maybe Harvey
Davies' FAN operators since Harvey was the one responsible for the version
3 NUG conventions). Harvey wrote in a thread on the netCDF mail list that
the intention was to deprecate the missing_value attribute and that's why I
included that statement in CF.

> It seems to me there is a fundamental difference between
> the two. My understanding is that the _FillValue is usually used by
> netCDF to prefill the disk space, so if the user fails to completely
> write all the arrays, but later tries to read in all elements, those
> elements unwritten will contain a predictable number (i.e., the
> _FillValue). It seems to me that incompletely writing data is in some
> sense a mistake, and this mistake can be easily identified by testing
> whether the _FillValue is found anywhere in the file.

The netCDF library does use _FillValue to prefill data. But this does not
make it easy to catch the mistake of incompletely written data for several
reasons:
1. The use of data prefill can be turned off. Basically data prefill
doubles the cost of outputting your data, so it's often turned off for
efficiency reasons (we do this in our atmosphere model).
2. The default _FillValue is not defined in a portable way. It's a literal
constant in the netcdf header file, hence it is possible that it may have
different binary representations on different machines.

> I think it is useful to distinguish between the different reasons for
> missing data. Because _FillValue is treated specially by netCDF, I
> think we should prefer the use of missing_value to identify data that
> have been written properly, but are missing by design.

I think that may have been the original intent of having both a _FillValue
and a missing_value attribute. But that distinction is not enforced by the
library and hence data writers are free to use it to indicate data that is
intentionally missing (rather than accidentally missing). That is by far
the most common use of this attribute, and I would guess that that's why
the decision was made to deprecate missing_value and just use _FillValue.
The current interpretation of _FillValue is that it is used to define the
valid range and all data values outside the valid range are considered to
be missing.

> Tell me what danger there is in using a "deprecated" attribute in the
> writing of CF-compliant dataset. \

There is no danger. My experience is that the only "safe" way to indicate
a missing data value is to specify it using both the _FillValue and
missing_value attributes. Some generic applications recognize only
_FillValue, and some recognize only missing_value. But an application that
only recognizes one is more likely to recognize the _FillValue.

Brian
Received on Wed Nov 05 2003 - 20:03:36 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒