Opened 11 years ago

Closed 5 years ago

#52 closed defect (fixed)

Clarification of _FillValue attribute

Reported by: ros Owned by: cf-conventions@…
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: Cc:

Description

Section 2.5.1 of the convention defines the standard for the _FillValue attribute stating that it is "a scalar attribute and must have the same type as its variable." There are no specific exclusions, but I think that allowing the value NaN isn't a good idea.

The NetCDF standard doesn't prohibit the use of NaN as a _FillValue, but I did find several recommendations not to use NaN. The problem with allowing NaN is that any calculation or comparisons with NaN will return false.

I suggest a modification to the convention to recommend against the use of NaN as a _FillValue value.

Thoughts?

Change History (9)

comment:1 Changed 11 years ago by jonathan

I agree with Ros's suggestion. If you can't test for missing data by comparing the data with the missing-data value, it's not easy to use it.

Jonathan

comment:2 in reply to: ↑ description Changed 11 years ago by russ

Replying to ros:

... The NetCDF standard doesn't prohibit the use of NaN as a _FillValue, but I did find several recommendations not to use NaN. The problem with allowing NaN is that any calculation or comparisons with NaN will return false.

I suggest a modification to the convention to recommend against the use of NaN as a _FillValue value.

Thoughts?

In Fortran, the function ieee_is_nan(x) (defined in the ieee intrinsic module) can be used to determine if x is a NaN or if the array x contains a NaN.

The C99 standard supports the macro isnan, whose declaration is in <math.h>. For Java, you can use the Float.isNaN() or Double.isNaN() static methods or the corresponding instance methods for Float or Double objects. Most C++ compilers have isnan in <cmath> or <math.h>, but it has apparently not made it into the current C++ standard.

Given the support for detecting NaNs in up-to-date compilers, I don't think its use should be discouraged, because it has some advantages that aren't available with other techniques. For example, with propagating NaNs, you could confidently take averages without checking every single value, knowing that the result would be a NaN if any of the values that went into the average were a NaN. This could be a useful time saver, if NaNs were unexpected or rare.

--Russ

comment:3 Changed 11 years ago by caron

I think a NaN is a natural missing value, since you dont have to reserve another value for that purpose. In the CDM we assume a NaN means "missing value" even when not documented by an attribute.

Its true that one cant naively test "if (value == _FillValue)" rather one must do "if (Double.isNaN(value) or (value == _FillValue))". However, because missing values can be specified not only by _FillValue, but also by missing_value/valid_min/max/range attributes, in practice testing for a missing value in a general way is a bit complicated and difficult to make fast. In the CDM, we typically replace missing values by NaN so that subsequent tests only have to do "if (Double.isNaN(value))" which improves speed.

Anyway, IMO, NaN is useful and should not be deprecated.

comment:4 follow-up: Changed 11 years ago by amanke

I agree that NaN is useful,and the reality is that people use it. But software needs to know it's being used.

Could perhaps the CF reference documentation state, "if you use Nan as a missing value indicator it is essential that you document its use in the file by including either the _FillValue or missing_value attribute."

comment:5 in reply to: ↑ 4 Changed 11 years ago by caron

Replying to amanke:

I agree that NaN is useful,and the reality is that people use it. But software needs to know it's being used.

Could perhaps the CF reference documentation state, "if you use Nan as a missing value indicator it is essential that you document its use in the file by including either the _FillValue or missing_value attribute."

agree

comment:6 Changed 11 years ago by cjw

OK, I finally figured out the ticket tracking (I need an account). Here are two replies I made that went into the ether:

On Jan 20:

What you see stated as a problem with a nan is what I see as the whole point of a nan. Any calculation with a nan should return a nan. If you use -99 (or some such value), then the resultant calculation returns some bogus value, or worse something that is in-bounds, and you don't realize there is an issue with the derived output.

And then April 7th:

I would like to re-iterate my support for nan, almost to the point of excluding all other values [barring some reason I am unaware of; I will assume the VAX issue is moot at this point]. Our software actually converts any missing_value / _FillValue to nan directly upon reading it from the file.

Chris

comment:7 Changed 11 years ago by cjw

I would like to strike the "almost to the point of excluding all other values" for legacy file support (Thanks Steve).

comment:8 Changed 9 years ago by jonathan

Ros and I have decided to close this ticket. It was raised as a defect but received objections, so if it were to be put forward again it would have to be a proposal for change. Jonathan

comment:9 Changed 5 years ago by jonathan

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.