⇐ ⇒

[CF-metadata] How to encode "not occurring" as distinct from "missing data"

From: Kehoe, Kenneth E. <kkehoe>
Date: Fri, 19 Jul 2019 19:16:08 +0000

Thanks for bringing this up Lars. I had a similar question a while back
(http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2015/018489.html) and
the suggestion was to use flag_values as you have proposed. As Jonathan
indicated the use of flag_values is not prohibited but does come with
some possible issues. Particularly software that automatically converts
values outside the valid range to NaN or a single missing value
indicator. Some people use this methodology instead of using
missing_value attribute. I don't know of any software that does this
automatically, but it could be an issue.

The proposal I gave was for some data that had history and could not be
updated so I was looking for a solution that would work without changes.
Discussions I had with others at meetings suggested using an ancillary
state variable. I find that a better solution since it separates the
data from metadata and would not require looping over multiple missing
values (I don't know if that is supported) or run into the issue of code
automatically changing data outside the valid range.

My program uses ancillary variables to contain corresponding quality
control information, and one of the pieces of information is that the
data variable is set to missing value. This allows for faster quality
control analysis by only needing to look at the ancillary variable to
find missing data instead of looking at both quality control and data
variable.

My suggestion is to use a single missing value indicator with the data
variable and then indicate the greater detail of why it is missing in
the ancillary variable using flag_values. Additional quality information
could be provided beyond the reason for being set to missing value. You
can also provided multiple pieces (inclusive state) of information on a
single value using flag_masks instead of flag_values.

For example:

cloud_layer_base_height(time, layer):float
 ??? long_name = "Cloud base height of hydrometeor layers? ;
 ??? units = "m? ;
 ??? missing_value = -9999.f ;
 ??? ancillary_varialbes = "qc_cloud_layer_base_height"
qc_cloud_layer_base_height (time, layer): short
 ??? long_name = "Quality information for Cloud base height of
hydrometeor layers"
 ??? units = "1"
 ??? flag_values = 0, 1, 2, 3, 4
 ??? flag_meanings = "data_available input_value_missing
input_data_exists_but_the_computation_did_not_result_in_a_valid_numeric_value
value_missing_because_of_birds value_was_computed_but_I_would_not_use_it"
 ??? standard_name = "status_flag"

I'm curious to hear your thoughts and others,

Ken


On 2019-7-19 06:57, Jonathan Gregory wrote:
> Dear Lars
>
> I think that using a flag_value would be a good CF way to do this. I am not
> sure whether it's a good idea to choose a value which is outside the valid
> range. That's not a problem for CF (that is, it's not prohibited), but maybe
> it might not suit some software, which could object if it wasn't aware of CF
> flag_values.
>
> Best wishes
>
> Jonathan
>
>
> ----- Forwarded message from B?rring Lars <Lars.Barring at smhi.se> -----
>
>> Date: Fri, 19 Jul 2019 10:20:35 +0000
>> From: B?rring Lars <Lars.Barring at smhi.se>
>> To: "cf-metadata at cgd.ucar.edu" <cf-metadata at cgd.ucar.edu>
>> Subject: [CF-metadata] How to encode "not occurring" as distinct from
>> "missing data"
>>
>> Dear all,
>>
>> We are considering how best to store data produced by some computation where there has to be a distinction between missing input data (i.e. no input data available) and "not occurring" (i.e. input data exists but the computation did not result in a valid numeric value).
>>
>> In practice, the situation is reasonably similar to what was discussed back in 2017 (in the thread "Recording "day of year on which something happens") where Jim Biard offered a solution (http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2017/019238.html).
>>
>> We have considered his solution to use flag values outside the valid_range of the data variable to indicate "no_occurrence". We have also considered to use a separate quality variable with flag values to use as as a mask (combined with _MissVal in the data variable).
>>
>> In this work the following questions surfaced,
>>
>> -- Is there any experience regarding how 'standard software' would handle either of these alternatives, is one more generally accepted?
>>
>> -- Is there any experience to guide us regarding which is better, or generally more "in line with the CF Conventions"?
>>
>> -- Is there another better approach that we have not thought of?
>>
>>
>> Many thanks,
>> Lars
>>
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
> ----- End forwarded message -----
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-- 
Kenneth E. Kehoe
   Research Associate - University of Oklahoma
   Cooperative Institute for Mesoscale Meteorological Studies
   ARM Climate Research Facility - Data Quality Office
   e-mail: kkehoe at ou.edu | Office: 303-497-4754
Received on Fri Jul 19 2019 - 13:16:08 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:43 BST

⇐ ⇒