⇐ ⇒

[CF-metadata] New standard_name of quality_flag for corresponding quality control variables

From: Andrew Barna <abarna>
Date: Tue, 23 Jul 2019 14:56:19 -0700

Looks good to me.

I took the "subjective" part from how Martin was asking about quality vs
status.

-Barna

On Tue, Jul 23, 2019 at 2:40 PM Kehoe, Kenneth E. <kkehoe at ou.edu> wrote:

> Barna,
>
> OK your definition is fine. I suggest one small change, drop the word
> subjective.
>
> status_flag: A variable with the standard name of status_flag contains an
> indication of quality or other status of another data variable. The linkage
> between the data variable and the variable with the standard_name of
> status_flag is achieved using the ancillary_variables attribute. A variable
> which contains purely quality information may use the standard_name of
> quality_flag.
>
> Ken
>
>
> On 2019-7-23 15:28, Andrew Barna wrote:
>
> Ken,
>
> I think I'm confused by the text of the proposed change to the definition
> of status_flag.
>
> In your proposed change the "quality" wording of the status_flag
> definition was dropped. Here is the first sentence of each:
> Current: A variable with the standard name of status_flag contains an
> indication of quality or other status of another data variable.
> Proposed: A variable with the standard name of status_flag contains an
> indication of status of another data variable.
>
> Perhaps the following for "status_flag":
> A variable with the standard name of status_flag contains an indication of
> quality or other status of another data variable. The linkage between the
> data variable and the variable with the standard_name of status_flag is
> achieved using the ancillary_variables attribute. *A variable which
> contains purely subjective quality information may use the standard_name of
> quality_flag.*
>
> That is, keep the current definition, but also inform of a more
> restrictive option. I don't see any way around not reading the
> flag_meanings with any of these options.
>
> -Barna
>
>
> On Tue, Jul 23, 2019 at 1:03 PM Kehoe, Kenneth E. <kkehoe at ou.edu> wrote:
>
>> Barna,
>>
>> I see this as an optional addition to narrow the standard. It does not
>> prohibit someone from using status_flag (as a standard_name or a
>> standard_name modifier) from a previous convention version
>> implementation nor invalidate that use from a previous convention
>> version. In your example the use of status_flag is a mixture of state
>> and quality. I see this new name as a way to improve things going
>> forward. Since the historical WOCE example uses state and quality with
>> some additional rules not listed in the CF standard it would be up to
>> the user to understand how to use the variable. Without seeing the WOCE
>> data I can't make a specific suggestion.
>>
>> I don't know about any rules regarding a restriction. I think the
>> general concept of CF is to set the minimum rules. Additional rules
>> applied by another group on top of CF is allowed. For example my
>> organization uses additional attributes not defined in CF. I see
>> quality_flag as a narrowing of the rules of status_flag not replace it.
>> status_flag can still have a mixture of state and quality if the data
>> provider prefers to do it that way. quality_flag can only have quality
>> information. The determination of what is quality information is
>> actually up to the data provider to decide.
>>
>> Ken
>>
>>
>>
>> On 2019-7-23 13:33, Andrew Barna wrote:
>> > Ken,
>> >
>> > Ok I see how this can be useful. Two more questions:
>> > * How would you deal with "legacy" flag schemes which mix "status" and
>> > "quality" already? I'm thinking of WOCE CTD as an example where "7"
>> > means Despiked (a status) and "3" means Questionable measurement (a
>> > quality). The way my seagoing group have dealt with both is by having
>> > the "quality" override "status" if the quality is anything other than
>> > "good", e.g. a questionable measurement which has been despiked gets
>> > flag 3.
>> >
>> > * Are there rules in CF regarding restricting an existing definition?
>> > I imagine there are many datasets already using the "status_flag" name
>> > as either a stand alone standard name or a standard name modifier.
>> > This change seems to be "breaking" in that previously compliant
>> > datasets would now have quality information in a purely status field.
>> >
>> > Thanks
>> > -Barna
>> >
>> > On Tue, Jul 23, 2019 at 10:08 AM Kehoe, Kenneth E. <kkehoe at ou.edu>
>> wrote:
>> >> Martin,
>> >>
>> >> Thanks for your reply. I would prefer to keep the proposal simple. My
>> example of a weighted mean was just one I created off the top of my head. I
>> don't see it as something to actually look into implementing.
>> >>
>> >> I need a way to indicate a variable is a quality status field. The
>> distinction that the status field only contains quality information is the
>> important distinction. The variable indicated with quality_flag will need
>> to also use flag_meanings, same as status_flag. Hence my reason for
>> choosing quality_flag to follow a similar naming pattern.
>> >>
>> >> Barna,
>> >>
>> >> Without a distinction that the entire variable is a quality variable
>> the user is forced to parse the flag_meanings to see if the variable
>> applies. This would also encourage a data provider to mix quality with
>> source or instrument state or something else in the same variable. That
>> would be very difficult to understand.
>> >>
>> >> As Martin points out quality is more subjective than other status
>> information. A user may need to choose what parts of the quality variable
>> to apply. I would prefer we not conflate absolute information with
>> subjective information. But we need a way to distinguish the variable
>> contains absolute information vs a variable that contains more subjective
>> information.
>> >>
>> >> To expand on Martin's example imagine a profiling instrument that has
>> a shutter to protect the laser from rain. The laser will always send out
>> pulses and the receiver will always be on receiving the return from laser
>> pulse. To know when the shutter is in the open state where the instrument
>> is profiling we would use a state variable with a simple flag_values method.
>> >>
>> >> short shutter (time)
>> >> shutter:long_name = "Shutter state"
>> >> shutter:units = '1'
>> >> shutter:flag_values = 0, 1
>> >> shutter:flag_meanings = "closed open"
>> >> shutter:standard_name = "status_flag"
>> >>
>> >> This variable is just indicating the position of the shutter. There is
>> no ambiguity with it's use. If a user wants to use the data for atmospheric
>> reasons they should filter to only use data where profiling. In fact we can
>> implement this variable into our code by only using data where shutter is
>> set to open.
>> >>
>> >> Here is an example of more subjective quality variable.
>> >>
>> >> short quality_variable (time)
>> >> quality_variable:long_name = "Quality variable for linked data
>> variable"
>> >> quality_variable:units = '1'
>> >> quality_variable:flag_masks= 1, 2, 4, 8, 16, 32
>> >> quality_variable:flag_meanings = "Shutter_not_open
>> >> Laser_below_80_percent_power
>> >> Laser_below_60_percent_power
>> >> Laser_below_40_percent_power
>> >> Bird_poop_may_be_on_sensor
>> >> Bird_poop_is_on_sensor"
>> >> quality_variable:flag_meanings = "Bad Suspect Suspect Bad Suspect
>> Bad"
>> >> quality_variable:standard_name = "quality_flag"
>> >>
>> >> In this example there are three indications when the laser is less
>> than 100%. It would be up to the user to decide what percentage is the
>> limit where they do not want to use the data. This is more subjective and
>> dependent on the research techniques to determine if the issue a problem or
>> not. It is also up to the user to determine if the chance of bird poop on
>> the sensor is an issue or if they are OK with the risk of using the data.
>> And to be nice to the user we have also pulled in information from the
>> shutter variable so the user can decided to only use the quality_variable
>> instead of using both shutter and quality_variable. This is up to the data
>> provider to decide. Some providers see the state of the shutter as quality
>> information, some would not. There is no requirements put on the quality
>> variable as to how it is used. It is just a quality information variable
>> following the same rules as a CF state variable.
>> >>
>> >> I have also included an attribute that I am not currently proposing
>> called flag_assessment. This is a subjective statement from the data
>> provider on their opinion of the quality of the data. A user can search for
>> the word "Bad" and then exclude only that data from analysis where the
>> mask is set. This would take all the guess work of quality away from the
>> user if they decided to take the opinion of the data provider. I'm not
>> currently proposing the addition of flag_meanings, this is just an example
>> of how quality can be expanded to be more simple for a user but not take
>> away the user's ability to make their own decision. Everyone has strong
>> opinions on quality of data.
>> >>
>> >> Thanks,
>> >>
>> >> Ken
>> >>
>> >> On 2019-7-23 06:50, Martin Juckes - UKRI STFC wrote:
>> >>
>> >> Dear Ken,
>> >>
>> >>
>> >> thanks for your response to me below.
>> >>
>> >>
>> >> Would it be fair to suggest that "status" should, as far as possible,
>> reflect a generic objective classification, with terms such as
>> "sensor_nonfunctional" which have a comparable meaning for all datasets,
>> while "quality" is a subjective *measure* with a meaning that may from
>> dataset to dataset? E.g. if dataset A has a maximum "quality" of 11 and
>> dataset B only goes up to 10, it doesn't necessarily imply that dataset A
>> is in any sense better and B.
>> >>
>> >>
>> >> If you want to use it in weighted means, perhaps it should be
>> "quality_measure" rather than "quality_flag"? With "status_flag" the order
>> of integer values does not have any meaning, but with quality perhaps it
>> would make more sense have some concept of a sequence of quality settings
>> (so that, for example "1" always indicates a quality between "0" and "2"
>> within a dataset, but could have different meanings in different datasets).
>> Could the quality also be expressed as a floating point number without any
>> flag meanings?
>> >>
>> >>
>> >> Responding to a point Barna raised: it is certainly possible to have
>> more than one "status_flag" variable, but I don't think it is ideal: if
>> information needs to be split across multiple variables we generally like
>> to describe the difference between the variables in the standard name or in
>> other metadata. In this case, I think there is a good case for using a new
>> standard name.
>> >>
>> >>
>> >> regards,
>> >>
>> >> Martin
>> >>
>> >>
>> >>
>> >>
>> >> ________________________________
>> >> From: CF-metadata <cf-metadata-bounces at cgd.ucar.edu> on behalf of
>> Andrew Barna <abarna at ucsd.edu>
>> >> Sent: 23 July 2019 00:23
>> >> To: Kehoe, Kenneth E.
>> >> Cc: cf-metadata at cgd.ucar.edu
>> >> Subject: Re: [CF-metadata] New standard_name of quality_flag for
>> corresponding quality control variables
>> >>
>> >> Ken,
>> >>
>> >> I guess, I don't see this proposed change as necessary since the
>> >> distinction between the terms "quality" and "status" is really done in
>> >> the "flag_meanings" attribute and is basically free form/uncontrolled.
>> >> These attributes need to be used by this new name as well.
>> >>
>> >> Let me rephrase my suggestion/question:
>> >> If this proposal is not adopted, but an example of how to use a
>> >> variable, with the standard name of "status_flag", to only indicate
>> >> data quality is included in the document, would that help?
>> >>
>> >> -Barna
>> >>
>> >> On Mon, Jul 22, 2019 at 1:22 PM Kehoe, Kenneth E. <kkehoe at ou.edu>
>> wrote:
>> >>
>> >> Barna,
>> >>
>> >> Yes an update to the CF document should follow after the new
>> >> standard_name is implemented. I think multiple examples are needed
>> since
>> >> status_flag covers many different types of state variables.
>> >>
>> >> Ken
>> >>
>> >>
>> >>
>> >> On 2019-7-22 10:35, Andrew Barna wrote:
>> >>
>> >> Hi Martin, Ken,
>> >>
>> >> Is there anything wrong with including multiple "status_flag"
>> >> variables to capture all separate state you wish? The CF document
>> >> unfortunately only includes an example of how to encode the status of
>> >> a sensor, but the actual meanings of the flag values are entirely up
>> >> to you, and this will not change with this proposal. Perhaps the CF
>> >> document would benefit from additional examples (e.g. one that only
>> >> shows data quality flags).
>> >>
>> >> -Barna
>> >>
>> >>
>> >> On Mon, Jul 22, 2019 at 9:04 AM Kehoe, Kenneth E. <kkehoe at ou.edu>
>> wrote:
>> >>
>> >> Hi Martin,
>> >>
>> >> I see status encompassing multiple metadata pieces of information. For
>> >> example it could be a state of the instrument as it cycles through a
>> >> pre-programed routine (Look at calibration target, look at sky, look at
>> >> ground, look at second calibration target, repeat...). Or the sources
>> of
>> >> the inputs for a model where the availability or some other reason
>> could
>> >> require making a decision on what source(s) to use. For provenance this
>> >> source information is important to report on a time step basis. Or the
>> >> status could be a data providers method to provide uncertainty
>> >> information (I see this as incorrect but some people do see it this
>> >> way). Each of these are important metadata but the method of use is
>> >> different than a strictly quality variable. A quality variable provides
>> >> information indicating if the data should be used or possibly could be
>> >> used in a weighted mean method to favor high quality data over low
>> >> quality data. The way the metadata is used is different depending on
>> the
>> >> metadata type. A state of the instrument would be used for sub-setting
>> >> calibration vs. data. There is no ambiguity in this as data from a
>> >> calibration target is not used in a weather research analysis. But
>> >> quality is more subjective and is decided by the data user. If the
>> >> quality variable has 20 different quality tests the user would need to
>> >> decided if all 20 test results should be used or only a subset. Also,
>> >> the code for applying the quality is different than the state of the
>> >> instrument view (in my example above).
>> >>
>> >> It is possible to have a quality test result from the state of the
>> >> instrument, but not the other way around (typically). So I need a way
>> to
>> >> distinguish the two for automated or semi-automated tools. Hence my
>> >> point of quality_flag essentially being a subset of status_flag
>> >>
>> >> Ken
>> >>
>> >>
>> >>
>> >> On 2019-7-22 02:57, Martin Juckes - UKRI STFC wrote:
>> >>
>> >> Dear Ken,
>> >>
>> >>
>> >> Can you expand on the distinction between "quality" and "status"? I
>> understand that they are different in principle, but, in order to support
>> this new standard name I think we need a clear objective statement of how
>> we would want to distinguish between them in CF.
>> >>
>> >> The conventions section on flags (3.5) mixes the two up (
>> http://cfconventions.org/cf-conventions/cf-conventions.html#flags
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cfconventions.org_cf-2Dconventions_cf-2Dconventions.html-23flags&d=DwMFaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=Vm7o2ZGxPkkqRuPs8nVMVQ&m=f8kQJDfPUHt7Yr0QWW9IT5PssWjH9plqdlgx0zbzbmU&s=NVXr_3U_yIRDQSgpD1aJpW7HG3d4-OGt43w08zZQBk8&e=>
>> ), so some re-wording of the document would also be needed,
>> >>
>> >> regards,
>> >> Martin
>> >>
>> >> ________________________________
>> >> From: CF-metadata <cf-metadata-bounces at cgd.ucar.edu> on behalf of
>> Kehoe, Kenneth E. <kkehoe at ou.edu>
>> >> Sent: 19 July 2019 06:42
>> >> To: cf-metadata at cgd.ucar.edu
>> >> Subject: [CF-metadata] New standard_name of quality_flag for
>> corresponding quality control variables
>> >>
>> >> Dear CF,
>> >>
>> >> I am proposing a new standard name of "quality_flag" to indicate a
>> variable is purely a quality control variable. A quality control variable
>> would use flag_values or flag_masks along with flag_meanings to allow
>> declaring levels of quality or results from quality indicating tests of the
>> data variable. This variable be a subset of the more general "status_flag"
>> standard name. Currently the definition of "status_flag" is:
>> >>
>> >> - A variable with the standard name of status_flag contains an
>> indication of quality or other status of another data variable. The linkage
>> between the data variable and the variable with the standard_name of
>> status_flag is achieved using the ancillary_variables attribute.
>> >>
>> >> This definition includes a variable used to define the state or other
>> status information of a variable and can not be distinguished by standard
>> name alone from a state of the instrument, processing decision, source
>> information, needed metadata about the data variable or other ancillary
>> variable type. Since there is no other way to define a purely quality
>> control variable, the use of "status_flag" is too general for strictly
>> quality control variables. By having a method to define a variable as
>> strictly quality control the results of quality control tests can be
>> applied to the data with a software tool based on requests by the user.
>> This would not affect current datasets that do use "status_flag" nor
>> require a change to the definition outside of the indication that
>> "quality_flag" standard name is available and a better use for pure quality
>> control variables.
>> >>
>> >> Proposed addition:
>> >>
>> >> quality_flag = A variable with the standard name of quality_flag
>> contains an indication of quality information of another data variable. The
>> linkage between the data variable and the variable or variables with the
>> standard_name of quality_flag is achieved using the ancillary_variables
>> attribute.
>> >>
>> >> Proposed change:
>> >>
>> >> status_flag = A variable with the standard name of status_flag
>> contains an indication of status of another data variable. The linkage
>> between the data variable and the variable with the standard_name of
>> status_flag is achieved using the ancillary_variables attribute. For data
>> quality information use quality_flag.
>> >>
>> >> Thanks,
>> >>
>> >> Ken
>> >>
>> >>
>> >>
>> >> --
>> >> Kenneth E. Kehoe
>> >> Research Associate - University of Oklahoma
>> >> Cooperative Institute for Mesoscale Meteorological Studies
>> >> ARM Climate Research Facility - Data Quality Office
>> >> e-mail: kkehoe at ou.edu<mailto:kkehoe at ou.edu> | Office:
>> 303-497-4754
>> >>
>> >> --
>> >> Kenneth E. Kehoe
>> >> Research Associate - University of Oklahoma
>> >> Cooperative Institute for Mesoscale Meteorological Studies
>> >> ARM Climate Research Facility - Data Quality Office
>> >> e-mail: kkehoe at ou.edu | Office: 303-497-4754
>> >>
>> >> _______________________________________________
>> >> CF-metadata mailing list
>> >> CF-metadata at cgd.ucar.edu
>> >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mailman.cgd.ucar.edu_mailman_listinfo_cf-2Dmetadata&d=DwMFaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=Vm7o2ZGxPkkqRuPs8nVMVQ&m=f8kQJDfPUHt7Yr0QWW9IT5PssWjH9plqdlgx0zbzbmU&s=faPyR9aIDWaBnEUvf-Fr_KcFOMNmAbPj4Yt-T5zAkmE&e=>
>> >>
>> >> --
>> >> Kenneth E. Kehoe
>> >> Research Associate - University of Oklahoma
>> >> Cooperative Institute for Mesoscale Meteorological Studies
>> >> ARM Climate Research Facility - Data Quality Office
>> >> e-mail: kkehoe at ou.edu | Office: 303-497-4754
>> >>
>> >> _______________________________________________
>> >> CF-metadata mailing list
>> >> CF-metadata at cgd.ucar.edu
>> >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mailman.cgd.ucar.edu_mailman_listinfo_cf-2Dmetadata&d=DwMFaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=Vm7o2ZGxPkkqRuPs8nVMVQ&m=f8kQJDfPUHt7Yr0QWW9IT5PssWjH9plqdlgx0zbzbmU&s=faPyR9aIDWaBnEUvf-Fr_KcFOMNmAbPj4Yt-T5zAkmE&e=>
>> >>
>> >> _______________________________________________
>> >> CF-metadata mailing list
>> >> CF-metadata at cgd.ucar.edu
>> >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mailman.cgd.ucar.edu_mailman_listinfo_cf-2Dmetadata&d=DwMFaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=Vm7o2ZGxPkkqRuPs8nVMVQ&m=f8kQJDfPUHt7Yr0QWW9IT5PssWjH9plqdlgx0zbzbmU&s=faPyR9aIDWaBnEUvf-Fr_KcFOMNmAbPj4Yt-T5zAkmE&e=>
>> >>
>> >>
>> >> --
>> >> Kenneth E. Kehoe
>> >> Research Associate - University of Oklahoma
>> >> Cooperative Institute for Mesoscale Meteorological Studies
>> >> ARM Climate Research Facility - Data Quality Office
>> >> e-mail: kkehoe at ou.edu | Office: 303-497-4754
>> >>
>> >> _______________________________________________
>> >> CF-metadata mailing list
>> >> CF-metadata at cgd.ucar.edu
>> >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mailman.cgd.ucar.edu_mailman_listinfo_cf-2Dmetadata&d=DwMFaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=Vm7o2ZGxPkkqRuPs8nVMVQ&m=f8kQJDfPUHt7Yr0QWW9IT5PssWjH9plqdlgx0zbzbmU&s=faPyR9aIDWaBnEUvf-Fr_KcFOMNmAbPj4Yt-T5zAkmE&e=>
>>
>> --
>> Kenneth E. Kehoe
>> Research Associate - University of Oklahoma
>> Cooperative Institute for Mesoscale Meteorological Studies
>> ARM Climate Research Facility - Data Quality Office
>> e-mail: kkehoe at ou.edu | Office: 303-497-4754
>>
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mailman.cgd.ucar.edu_mailman_listinfo_cf-2Dmetadata&d=DwMFaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=Vm7o2ZGxPkkqRuPs8nVMVQ&m=f8kQJDfPUHt7Yr0QWW9IT5PssWjH9plqdlgx0zbzbmU&s=faPyR9aIDWaBnEUvf-Fr_KcFOMNmAbPj4Yt-T5zAkmE&e=>
>>
>
> --
> Kenneth E. Kehoe
> Research Associate - University of Oklahoma
> Cooperative Institute for Mesoscale Meteorological Studies
> ARM Climate Research Facility - Data Quality Office
> e-mail: kkehoe at ou.edu | Office: 303-497-4754
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20190723/b180d18e/attachment-0001.html>
Received on Tue Jul 23 2019 - 15:56:19 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:43 BST

⇐ ⇒