⇐ ⇒

[CF-metadata] New standard_name of quality_flag for corresponding quality control variables

From: Andrew Barna <abarna>
Date: Tue, 23 Jul 2019 14:28:50 -0700

Ken,

I think I'm confused by the text of the proposed change to the definition
of status_flag.

In your proposed change the "quality" wording of the status_flag definition
was dropped. Here is the first sentence of each:
Current: A variable with the standard name of status_flag contains an
indication of quality or other status of another data variable.
Proposed: A variable with the standard name of status_flag contains an
indication of status of another data variable.

Perhaps the following for "status_flag":
A variable with the standard name of status_flag contains an indication of
quality or other status of another data variable. The linkage between the
data variable and the variable with the standard_name of status_flag is
achieved using the ancillary_variables attribute. *A variable which
contains purely subjective quality information may use the standard_name of
quality_flag.*

That is, keep the current definition, but also inform of a more restrictive
option. I don't see any way around not reading the flag_meanings with any
of these options.

-Barna


On Tue, Jul 23, 2019 at 1:03 PM Kehoe, Kenneth E. <kkehoe at ou.edu> wrote:

> Barna,
>
> I see this as an optional addition to narrow the standard. It does not
> prohibit someone from using status_flag (as a standard_name or a
> standard_name modifier) from a previous convention version
> implementation nor invalidate that use from a previous convention
> version. In your example the use of status_flag is a mixture of state
> and quality. I see this new name as a way to improve things going
> forward. Since the historical WOCE example uses state and quality with
> some additional rules not listed in the CF standard it would be up to
> the user to understand how to use the variable. Without seeing the WOCE
> data I can't make a specific suggestion.
>
> I don't know about any rules regarding a restriction. I think the
> general concept of CF is to set the minimum rules. Additional rules
> applied by another group on top of CF is allowed. For example my
> organization uses additional attributes not defined in CF. I see
> quality_flag as a narrowing of the rules of status_flag not replace it.
> status_flag can still have a mixture of state and quality if the data
> provider prefers to do it that way. quality_flag can only have quality
> information. The determination of what is quality information is
> actually up to the data provider to decide.
>
> Ken
>
>
>
> On 2019-7-23 13:33, Andrew Barna wrote:
> > Ken,
> >
> > Ok I see how this can be useful. Two more questions:
> > * How would you deal with "legacy" flag schemes which mix "status" and
> > "quality" already? I'm thinking of WOCE CTD as an example where "7"
> > means Despiked (a status) and "3" means Questionable measurement (a
> > quality). The way my seagoing group have dealt with both is by having
> > the "quality" override "status" if the quality is anything other than
> > "good", e.g. a questionable measurement which has been despiked gets
> > flag 3.
> >
> > * Are there rules in CF regarding restricting an existing definition?
> > I imagine there are many datasets already using the "status_flag" name
> > as either a stand alone standard name or a standard name modifier.
> > This change seems to be "breaking" in that previously compliant
> > datasets would now have quality information in a purely status field.
> >
> > Thanks
> > -Barna
> >
> > On Tue, Jul 23, 2019 at 10:08 AM Kehoe, Kenneth E. <kkehoe at ou.edu>
> wrote:
> >> Martin,
> >>
> >> Thanks for your reply. I would prefer to keep the proposal simple. My
> example of a weighted mean was just one I created off the top of my head. I
> don't see it as something to actually look into implementing.
> >>
> >> I need a way to indicate a variable is a quality status field. The
> distinction that the status field only contains quality information is the
> important distinction. The variable indicated with quality_flag will need
> to also use flag_meanings, same as status_flag. Hence my reason for
> choosing quality_flag to follow a similar naming pattern.
> >>
> >> Barna,
> >>
> >> Without a distinction that the entire variable is a quality variable
> the user is forced to parse the flag_meanings to see if the variable
> applies. This would also encourage a data provider to mix quality with
> source or instrument state or something else in the same variable. That
> would be very difficult to understand.
> >>
> >> As Martin points out quality is more subjective than other status
> information. A user may need to choose what parts of the quality variable
> to apply. I would prefer we not conflate absolute information with
> subjective information. But we need a way to distinguish the variable
> contains absolute information vs a variable that contains more subjective
> information.
> >>
> >> To expand on Martin's example imagine a profiling instrument that has a
> shutter to protect the laser from rain. The laser will always send out
> pulses and the receiver will always be on receiving the return from laser
> pulse. To know when the shutter is in the open state where the instrument
> is profiling we would use a state variable with a simple flag_values method.
> >>
> >> short shutter (time)
> >> shutter:long_name = "Shutter state"
> >> shutter:units = '1'
> >> shutter:flag_values = 0, 1
> >> shutter:flag_meanings = "closed open"
> >> shutter:standard_name = "status_flag"
> >>
> >> This variable is just indicating the position of the shutter. There is
> no ambiguity with it's use. If a user wants to use the data for atmospheric
> reasons they should filter to only use data where profiling. In fact we can
> implement this variable into our code by only using data where shutter is
> set to open.
> >>
> >> Here is an example of more subjective quality variable.
> >>
> >> short quality_variable (time)
> >> quality_variable:long_name = "Quality variable for linked data
> variable"
> >> quality_variable:units = '1'
> >> quality_variable:flag_masks= 1, 2, 4, 8, 16, 32
> >> quality_variable:flag_meanings = "Shutter_not_open
> >> Laser_below_80_percent_power
> >> Laser_below_60_percent_power
> >> Laser_below_40_percent_power
> >> Bird_poop_may_be_on_sensor
> >> Bird_poop_is_on_sensor"
> >> quality_variable:flag_meanings = "Bad Suspect Suspect Bad Suspect
> Bad"
> >> quality_variable:standard_name = "quality_flag"
> >>
> >> In this example there are three indications when the laser is less than
> 100%. It would be up to the user to decide what percentage is the limit
> where they do not want to use the data. This is more subjective and
> dependent on the research techniques to determine if the issue a problem or
> not. It is also up to the user to determine if the chance of bird poop on
> the sensor is an issue or if they are OK with the risk of using the data.
> And to be nice to the user we have also pulled in information from the
> shutter variable so the user can decided to only use the quality_variable
> instead of using both shutter and quality_variable. This is up to the data
> provider to decide. Some providers see the state of the shutter as quality
> information, some would not. There is no requirements put on the quality
> variable as to how it is used. It is just a quality information variable
> following the same rules as a CF state variable.
> >>
> >> I have also included an attribute that I am not currently proposing
> called flag_assessment. This is a subjective statement from the data
> provider on their opinion of the quality of the data. A user can search for
> the word "Bad" and then exclude only that data from analysis where the
> mask is set. This would take all the guess work of quality away from the
> user if they decided to take the opinion of the data provider. I'm not
> currently proposing the addition of flag_meanings, this is just an example
> of how quality can be expanded to be more simple for a user but not take
> away the user's ability to make their own decision. Everyone has strong
> opinions on quality of data.
> >>
> >> Thanks,
> >>
> >> Ken
> >>
> >> On 2019-7-23 06:50, Martin Juckes - UKRI STFC wrote:
> >>
> >> Dear Ken,
> >>
> >>
> >> thanks for your response to me below.
> >>
> >>
> >> Would it be fair to suggest that "status" should, as far as possible,
> reflect a generic objective classification, with terms such as
> "sensor_nonfunctional" which have a comparable meaning for all datasets,
> while "quality" is a subjective *measure* with a meaning that may from
> dataset to dataset? E.g. if dataset A has a maximum "quality" of 11 and
> dataset B only goes up to 10, it doesn't necessarily imply that dataset A
> is in any sense better and B.
> >>
> >>
> >> If you want to use it in weighted means, perhaps it should be
> "quality_measure" rather than "quality_flag"? With "status_flag" the order
> of integer values does not have any meaning, but with quality perhaps it
> would make more sense have some concept of a sequence of quality settings
> (so that, for example "1" always indicates a quality between "0" and "2"
> within a dataset, but could have different meanings in different datasets).
> Could the quality also be expressed as a floating point number without any
> flag meanings?
> >>
> >>
> >> Responding to a point Barna raised: it is certainly possible to have
> more than one "status_flag" variable, but I don't think it is ideal: if
> information needs to be split across multiple variables we generally like
> to describe the difference between the variables in the standard name or in
> other metadata. In this case, I think there is a good case for using a new
> standard name.
> >>
> >>
> >> regards,
> >>
> >> Martin
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: CF-metadata <cf-metadata-bounces at cgd.ucar.edu> on behalf of
> Andrew Barna <abarna at ucsd.edu>
> >> Sent: 23 July 2019 00:23
> >> To: Kehoe, Kenneth E.
> >> Cc: cf-metadata at cgd.ucar.edu
> >> Subject: Re: [CF-metadata] New standard_name of quality_flag for
> corresponding quality control variables
> >>
> >> Ken,
> >>
> >> I guess, I don't see this proposed change as necessary since the
> >> distinction between the terms "quality" and "status" is really done in
> >> the "flag_meanings" attribute and is basically free form/uncontrolled.
> >> These attributes need to be used by this new name as well.
> >>
> >> Let me rephrase my suggestion/question:
> >> If this proposal is not adopted, but an example of how to use a
> >> variable, with the standard name of "status_flag", to only indicate
> >> data quality is included in the document, would that help?
> >>
> >> -Barna
> >>
> >> On Mon, Jul 22, 2019 at 1:22 PM Kehoe, Kenneth E. <kkehoe at ou.edu>
> wrote:
> >>
> >> Barna,
> >>
> >> Yes an update to the CF document should follow after the new
> >> standard_name is implemented. I think multiple examples are needed since
> >> status_flag covers many different types of state variables.
> >>
> >> Ken
> >>
> >>
> >>
> >> On 2019-7-22 10:35, Andrew Barna wrote:
> >>
> >> Hi Martin, Ken,
> >>
> >> Is there anything wrong with including multiple "status_flag"
> >> variables to capture all separate state you wish? The CF document
> >> unfortunately only includes an example of how to encode the status of
> >> a sensor, but the actual meanings of the flag values are entirely up
> >> to you, and this will not change with this proposal. Perhaps the CF
> >> document would benefit from additional examples (e.g. one that only
> >> shows data quality flags).
> >>
> >> -Barna
> >>
> >>
> >> On Mon, Jul 22, 2019 at 9:04 AM Kehoe, Kenneth E. <kkehoe at ou.edu>
> wrote:
> >>
> >> Hi Martin,
> >>
> >> I see status encompassing multiple metadata pieces of information. For
> >> example it could be a state of the instrument as it cycles through a
> >> pre-programed routine (Look at calibration target, look at sky, look at
> >> ground, look at second calibration target, repeat...). Or the sources of
> >> the inputs for a model where the availability or some other reason could
> >> require making a decision on what source(s) to use. For provenance this
> >> source information is important to report on a time step basis. Or the
> >> status could be a data providers method to provide uncertainty
> >> information (I see this as incorrect but some people do see it this
> >> way). Each of these are important metadata but the method of use is
> >> different than a strictly quality variable. A quality variable provides
> >> information indicating if the data should be used or possibly could be
> >> used in a weighted mean method to favor high quality data over low
> >> quality data. The way the metadata is used is different depending on the
> >> metadata type. A state of the instrument would be used for sub-setting
> >> calibration vs. data. There is no ambiguity in this as data from a
> >> calibration target is not used in a weather research analysis. But
> >> quality is more subjective and is decided by the data user. If the
> >> quality variable has 20 different quality tests the user would need to
> >> decided if all 20 test results should be used or only a subset. Also,
> >> the code for applying the quality is different than the state of the
> >> instrument view (in my example above).
> >>
> >> It is possible to have a quality test result from the state of the
> >> instrument, but not the other way around (typically). So I need a way to
> >> distinguish the two for automated or semi-automated tools. Hence my
> >> point of quality_flag essentially being a subset of status_flag
> >>
> >> Ken
> >>
> >>
> >>
> >> On 2019-7-22 02:57, Martin Juckes - UKRI STFC wrote:
> >>
> >> Dear Ken,
> >>
> >>
> >> Can you expand on the distinction between "quality" and "status"? I
> understand that they are different in principle, but, in order to support
> this new standard name I think we need a clear objective statement of how
> we would want to distinguish between them in CF.
> >>
> >> The conventions section on flags (3.5) mixes the two up (
> http://cfconventions.org/cf-conventions/cf-conventions.html#flags ), so
> some re-wording of the document would also be needed,
> >>
> >> regards,
> >> Martin
> >>
> >> ________________________________
> >> From: CF-metadata <cf-metadata-bounces at cgd.ucar.edu> on behalf of
> Kehoe, Kenneth E. <kkehoe at ou.edu>
> >> Sent: 19 July 2019 06:42
> >> To: cf-metadata at cgd.ucar.edu
> >> Subject: [CF-metadata] New standard_name of quality_flag for
> corresponding quality control variables
> >>
> >> Dear CF,
> >>
> >> I am proposing a new standard name of "quality_flag" to indicate a
> variable is purely a quality control variable. A quality control variable
> would use flag_values or flag_masks along with flag_meanings to allow
> declaring levels of quality or results from quality indicating tests of the
> data variable. This variable be a subset of the more general "status_flag"
> standard name. Currently the definition of "status_flag" is:
> >>
> >> - A variable with the standard name of status_flag contains an
> indication of quality or other status of another data variable. The linkage
> between the data variable and the variable with the standard_name of
> status_flag is achieved using the ancillary_variables attribute.
> >>
> >> This definition includes a variable used to define the state or other
> status information of a variable and can not be distinguished by standard
> name alone from a state of the instrument, processing decision, source
> information, needed metadata about the data variable or other ancillary
> variable type. Since there is no other way to define a purely quality
> control variable, the use of "status_flag" is too general for strictly
> quality control variables. By having a method to define a variable as
> strictly quality control the results of quality control tests can be
> applied to the data with a software tool based on requests by the user.
> This would not affect current datasets that do use "status_flag" nor
> require a change to the definition outside of the indication that
> "quality_flag" standard name is available and a better use for pure quality
> control variables.
> >>
> >> Proposed addition:
> >>
> >> quality_flag = A variable with the standard name of quality_flag
> contains an indication of quality information of another data variable. The
> linkage between the data variable and the variable or variables with the
> standard_name of quality_flag is achieved using the ancillary_variables
> attribute.
> >>
> >> Proposed change:
> >>
> >> status_flag = A variable with the standard name of status_flag contains
> an indication of status of another data variable. The linkage between the
> data variable and the variable with the standard_name of status_flag is
> achieved using the ancillary_variables attribute. For data quality
> information use quality_flag.
> >>
> >> Thanks,
> >>
> >> Ken
> >>
> >>
> >>
> >> --
> >> Kenneth E. Kehoe
> >> Research Associate - University of Oklahoma
> >> Cooperative Institute for Mesoscale Meteorological Studies
> >> ARM Climate Research Facility - Data Quality Office
> >> e-mail: kkehoe at ou.edu<mailto:kkehoe at ou.edu> | Office: 303-497-4754
> >>
> >> --
> >> Kenneth E. Kehoe
> >> Research Associate - University of Oklahoma
> >> Cooperative Institute for Mesoscale Meteorological Studies
> >> ARM Climate Research Facility - Data Quality Office
> >> e-mail: kkehoe at ou.edu | Office: 303-497-4754
> >>
> >> _______________________________________________
> >> CF-metadata mailing list
> >> CF-metadata at cgd.ucar.edu
> >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> >>
> >> --
> >> Kenneth E. Kehoe
> >> Research Associate - University of Oklahoma
> >> Cooperative Institute for Mesoscale Meteorological Studies
> >> ARM Climate Research Facility - Data Quality Office
> >> e-mail: kkehoe at ou.edu | Office: 303-497-4754
> >>
> >> _______________________________________________
> >> CF-metadata mailing list
> >> CF-metadata at cgd.ucar.edu
> >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> >>
> >> _______________________________________________
> >> CF-metadata mailing list
> >> CF-metadata at cgd.ucar.edu
> >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> >>
> >>
> >> --
> >> Kenneth E. Kehoe
> >> Research Associate - University of Oklahoma
> >> Cooperative Institute for Mesoscale Meteorological Studies
> >> ARM Climate Research Facility - Data Quality Office
> >> e-mail: kkehoe at ou.edu | Office: 303-497-4754
> >>
> >> _______________________________________________
> >> CF-metadata mailing list
> >> CF-metadata at cgd.ucar.edu
> >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
> --
> Kenneth E. Kehoe
> Research Associate - University of Oklahoma
> Cooperative Institute for Mesoscale Meteorological Studies
> ARM Climate Research Facility - Data Quality Office
> e-mail: kkehoe at ou.edu | Office: 303-497-4754
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20190723/dc1edee9/attachment-0001.html>
Received on Tue Jul 23 2019 - 15:28:50 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:43 BST

⇐ ⇒