⇐ ⇒

[CF-metadata] New standard_name of quality_flag for corresponding quality control variables

From: Kehoe, Kenneth E. <kkehoe>
Date: Tue, 23 Jul 2019 17:08:37 +0000

Martin,

Thanks for your reply. I would prefer to keep the proposal simple. My example of a weighted mean was just one I created off the top of my head. I don't see it as something to actually look into implementing.

I need a way to indicate a variable is a quality status field. The distinction that the status field only contains quality information is the important distinction. The variable indicated with quality_flag will need to also use flag_meanings, same as status_flag. Hence my reason for choosing quality_flag to follow a similar naming pattern.

Barna,

Without a distinction that the entire variable is a quality variable the user is forced to parse the flag_meanings to see if the variable applies. This would also encourage a data provider to mix quality with source or instrument state or something else in the same variable. That would be very difficult to understand.

As Martin points out quality is more subjective than other status information. A user may need to choose what parts of the quality variable to apply. I would prefer we not conflate absolute information with subjective information. But we need a way to distinguish the variable contains absolute information vs a variable that contains more subjective information.

To expand on Martin's example imagine a profiling instrument that has a shutter to protect the laser from rain. The laser will always send out pulses and the receiver will always be on receiving the return from laser pulse. To know when the shutter is in the open state where the instrument is profiling we would use a state variable with a simple flag_values method.

short shutter (time)
  shutter:long_name = "Shutter state"
  shutter:units = '1'
  shutter:flag_values = 0, 1
  shutter:flag_meanings = "closed open"
  shutter:standard_name = "status_flag"

This variable is just indicating the position of the shutter. There is no ambiguity with it's use. If a user wants to use the data for atmospheric reasons they should filter to only use data where profiling. In fact we can implement this variable into our code by only using data where shutter is set to open.

Here is an example of more subjective quality variable.

short quality_variable (time)
  quality_variable:long_name = "Quality variable for linked data variable"
  quality_variable:units = '1'
  quality_variable:flag_masks= 1, 2, 4, 8, 16, 32
  quality_variable:flag_meanings = "Shutter_not_open
    Laser_below_80_percent_power
    Laser_below_60_percent_power
    Laser_below_40_percent_power
    Bird_poop_may_be_on_sensor
    Bird_poop_is_on_sensor"
  quality_variable:flag_meanings = "Bad Suspect Suspect Bad Suspect Bad"
  quality_variable:standard_name = "quality_flag"

In this example there are three indications when the laser is less than 100%. It would be up to the user to decide what percentage is the limit where they do not want to use the data. This is more subjective and dependent on the research techniques to determine if the issue a problem or not. It is also up to the user to determine if the chance of bird poop on the sensor is an issue or if they are OK with the risk of using the data. And to be nice to the user we have also pulled in information from the shutter variable so the user can decided to only use the quality_variable instead of using both shutter and quality_variable. This is up to the data provider to decide. Some providers see the state of the shutter as quality information, some would not. There is no requirements put on the quality variable as to how it is used. It is just a quality information variable following the same rules as a CF state variable.

I have also included an attribute that I am not currently proposing called flag_assessment. This is a subjective statement from the data provider on their opinion of the quality of the data. A user can search for the word "Bad" and then exclude only that data from analysis where the mask is set. This would take all the guess work of quality away from the user if they decided to take the opinion of the data provider. I'm not currently proposing the addition of flag_meanings, this is just an example of how quality can be expanded to be more simple for a user but not take away the user's ability to make their own decision. Everyone has strong opinions on quality of data.

Thanks,

Ken

On 2019-7-23 06:50, Martin Juckes - UKRI STFC wrote:

Dear Ken,


thanks for your response to me below.


Would it be fair to suggest that "status" should, as far as possible, reflect a generic objective classification, with terms such as "sensor_nonfunctional" which have a comparable meaning for all datasets, while "quality" is a subjective *measure* with a meaning that may from dataset to dataset? E.g. if dataset A has a maximum "quality" of 11 and dataset B only goes up to 10, it doesn't necessarily imply that dataset A is in any sense better and B.


If you want to use it in weighted means, perhaps it should be "quality_measure" rather than "quality_flag"? With "status_flag" the order of integer values does not have any meaning, but with quality perhaps it would make more sense have some concept of a sequence of quality settings (so that, for example "1" always indicates a quality between "0" and "2" within a dataset, but could have different meanings in different datasets). Could the quality also be expressed as a floating point number without any flag meanings?


Responding to a point Barna raised: it is certainly possible to have more than one "status_flag" variable, but I don't think it is ideal: if information needs to be split across multiple variables we generally like to describe the difference between the variables in the standard name or in other metadata. In this case, I think there is a good case for using a new standard name.


regards,

Martin




________________________________
From: CF-metadata <cf-metadata-bounces at cgd.ucar.edu><mailto:cf-metadata-bounces at cgd.ucar.edu> on behalf of Andrew Barna <abarna at ucsd.edu><mailto:abarna at ucsd.edu>
Sent: 23 July 2019 00:23
To: Kehoe, Kenneth E.
Cc: cf-metadata at cgd.ucar.edu<mailto:cf-metadata at cgd.ucar.edu>
Subject: Re: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables

Ken,

I guess, I don't see this proposed change as necessary since the
distinction between the terms "quality" and "status" is really done in
the "flag_meanings" attribute and is basically free form/uncontrolled.
These attributes need to be used by this new name as well.

Let me rephrase my suggestion/question:
If this proposal is not adopted, but an example of how to use a
variable, with the standard name of "status_flag", to only indicate
data quality is included in the document, would that help?

-Barna

On Mon, Jul 22, 2019 at 1:22 PM Kehoe, Kenneth E. <kkehoe at ou.edu><mailto:kkehoe at ou.edu> wrote:



Barna,

Yes an update to the CF document should follow after the new
standard_name is implemented. I think multiple examples are needed since
status_flag covers many different types of state variables.

Ken



On 2019-7-22 10:35, Andrew Barna wrote:


Hi Martin, Ken,

Is there anything wrong with including multiple "status_flag"
variables to capture all separate state you wish? The CF document
unfortunately only includes an example of how to encode the status of
a sensor, but the actual meanings of the flag values are entirely up
to you, and this will not change with this proposal. Perhaps the CF
document would benefit from additional examples (e.g. one that only
shows data quality flags).

-Barna


On Mon, Jul 22, 2019 at 9:04 AM Kehoe, Kenneth E. <kkehoe at ou.edu><mailto:kkehoe at ou.edu> wrote:


Hi Martin,

I see status encompassing multiple metadata pieces of information. For
example it could be a state of the instrument as it cycles through a
pre-programed routine (Look at calibration target, look at sky, look at
ground, look at second calibration target, repeat...). Or the sources of
the inputs for a model where the availability or some other reason could
require making a decision on what source(s) to use. For provenance this
source information is important to report on a time step basis. Or the
status could be a data providers method to provide uncertainty
information (I see this as incorrect but some people do see it this
way). Each of these are important metadata but the method of use is
different than a strictly quality variable. A quality variable provides
information indicating if the data should be used or possibly could be
used in a weighted mean method to favor high quality data over low
quality data. The way the metadata is used is different depending on the
metadata type. A state of the instrument would be used for sub-setting
calibration vs. data. There is no ambiguity in this as data from a
calibration target is not used in a weather research analysis. But
quality is more subjective and is decided by the data user. If the
quality variable has 20 different quality tests the user would need to
decided if all 20 test results should be used or only a subset. Also,
the code for applying the quality is different than the state of the
instrument view (in my example above).

It is possible to have a quality test result from the state of the
instrument, but not the other way around (typically). So I need a way to
distinguish the two for automated or semi-automated tools. Hence my
point of quality_flag essentially being a subset of status_flag

Ken



On 2019-7-22 02:57, Martin Juckes - UKRI STFC wrote:


Dear Ken,


Can you expand on the distinction between "quality" and "status"? I understand that they are different in principle, but, in order to support this new standard name I think we need a clear objective statement of how we would want to distinguish between them in CF.

The conventions section on flags (3.5) mixes the two up (http://cfconventions.org/cf-conventions/cf-conventions.html#flags ), so some re-wording of the document would also be needed,

regards,
Martin

________________________________
From: CF-metadata <cf-metadata-bounces at cgd.ucar.edu><mailto:cf-metadata-bounces at cgd.ucar.edu> on behalf of Kehoe, Kenneth E. <kkehoe at ou.edu><mailto:kkehoe at ou.edu>
Sent: 19 July 2019 06:42
To: cf-metadata at cgd.ucar.edu<mailto:cf-metadata at cgd.ucar.edu>
Subject: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables

Dear CF,

I am proposing a new standard name of "quality_flag" to indicate a variable is purely a quality control variable. A quality control variable would use flag_values or flag_masks along with flag_meanings to allow declaring levels of quality or results from quality indicating tests of the data variable. This variable be a subset of the more general "status_flag" standard name. Currently the definition of "status_flag" is:

- A variable with the standard name of status_flag contains an indication of quality or other status of another data variable. The linkage between the data variable and the variable with the standard_name of status_flag is achieved using the ancillary_variables attribute.

This definition includes a variable used to define the state or other status information of a variable and can not be distinguished by standard name alone from a state of the instrument, processing decision, source information, needed metadata about the data variable or other ancillary variable type. Since there is no other way to define a purely quality control variable, the use of "status_flag" is too general for strictly quality control variables. By having a method to define a variable as strictly quality control the results of quality control tests can be applied to the data with a software tool based on requests by the user. This would not affect current datasets that do use "status_flag" nor require a change to the definition outside of the indication that "quality_flag" standard name is available and a better use for pure quality control variables.

Proposed addition:

quality_flag = A variable with the standard name of quality_flag contains an indication of quality information of another data variable. The linkage between the data variable and the variable or variables with the standard_name of quality_flag is achieved using the ancillary_variables attribute.

Proposed change:

status_flag = A variable with the standard name of status_flag contains an indication of status of another data variable. The linkage between the data variable and the variable with the standard_name of status_flag is achieved using the ancillary_variables attribute. For data quality information use quality_flag.

Thanks,

Ken



--
Kenneth E. Kehoe
    Research Associate - University of Oklahoma
    Cooperative Institute for Mesoscale Meteorological Studies
    ARM Climate Research Facility - Data Quality Office
    e-mail: kkehoe at ou.edu<mailto:kkehoe at ou.edu><mailto:kkehoe at ou.edu><mailto:kkehoe at ou.edu> | Office: 303-497-4754
--
Kenneth E. Kehoe
    Research Associate - University of Oklahoma
    Cooperative Institute for Mesoscale Meteorological Studies
    ARM Climate Research Facility - Data Quality Office
    e-mail: kkehoe at ou.edu<mailto:kkehoe at ou.edu> | Office: 303-497-4754
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
Kenneth E. Kehoe
   Research Associate - University of Oklahoma
   Cooperative Institute for Mesoscale Meteorological Studies
   ARM Climate Research Facility - Data Quality Office
   e-mail: kkehoe at ou.edu<mailto:kkehoe at ou.edu> | Office: 303-497-4754
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
Kenneth E. Kehoe
  Research Associate - University of Oklahoma
  Cooperative Institute for Mesoscale Meteorological Studies
  ARM Climate Research Facility - Data Quality Office
  e-mail: kkehoe at ou.edu<mailto:kkehoe at ou.edu> | Office: 303-497-4754
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20190723/397e644e/attachment-0001.html>
Received on Tue Jul 23 2019 - 11:08:37 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:43 BST

⇐ ⇒