⇐ ⇒

[CF-metadata] New standard_name of quality_flag for corresponding quality control variables

From: Lowry, Roy K. <rkl>
Date: Wed, 24 Jul 2019 08:01:53 +0000

Dear Barna,

I don't think legacy schemes that notoriously mix quality statements with other information are a problem. They would simply be labelled 'status_flag'. 'quality_flag' would be reserved for schemes with cleaner semantics. My understanding of the proposal does not change the meaning of 'status_flag' to exclude flag schemes with some quality flags.

Cheers, Roy.


I have now retired but will continue to be active through an Emeritus Fellowship using this e-mail address.

________________________________
From: CF-metadata <cf-metadata-bounces at cgd.ucar.edu> on behalf of Andrew Barna <abarna at ucsd.edu>
Sent: 23 July 2019 20:33
To: Kehoe, Kenneth E. <kkehoe at ou.edu>
Cc: cf-metadata at cgd.ucar.edu <cf-metadata at cgd.ucar.edu>
Subject: Re: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables

Ken,

Ok I see how this can be useful. Two more questions:
* How would you deal with "legacy" flag schemes which mix "status" and
"quality" already? I'm thinking of WOCE CTD as an example where "7"
means Despiked (a status) and "3" means Questionable measurement (a
quality). The way my seagoing group have dealt with both is by having
the "quality" override "status" if the quality is anything other than
"good", e.g. a questionable measurement which has been despiked gets
flag 3.

* Are there rules in CF regarding restricting an existing definition?
I imagine there are many datasets already using the "status_flag" name
as either a stand alone standard name or a standard name modifier.
This change seems to be "breaking" in that previously compliant
datasets would now have quality information in a purely status field.

Thanks
-Barna

On Tue, Jul 23, 2019 at 10:08 AM Kehoe, Kenneth E. <kkehoe at ou.edu> wrote:
>
> Martin,
>
> Thanks for your reply. I would prefer to keep the proposal simple. My example of a weighted mean was just one I created off the top of my head. I don't see it as something to actually look into implementing.
>
> I need a way to indicate a variable is a quality status field. The distinction that the status field only contains quality information is the important distinction. The variable indicated with quality_flag will need to also use flag_meanings, same as status_flag. Hence my reason for choosing quality_flag to follow a similar naming pattern.
>
> Barna,
>
> Without a distinction that the entire variable is a quality variable the user is forced to parse the flag_meanings to see if the variable applies. This would also encourage a data provider to mix quality with source or instrument state or something else in the same variable. That would be very difficult to understand.
>
> As Martin points out quality is more subjective than other status information. A user may need to choose what parts of the quality variable to apply. I would prefer we not conflate absolute information with subjective information. But we need a way to distinguish the variable contains absolute information vs a variable that contains more subjective information.
>
> To expand on Martin's example imagine a profiling instrument that has a shutter to protect the laser from rain. The laser will always send out pulses and the receiver will always be on receiving the return from laser pulse. To know when the shutter is in the open state where the instrument is profiling we would use a state variable with a simple flag_values method.
>
> short shutter (time)
> shutter:long_name = "Shutter state"
> shutter:units = '1'
> shutter:flag_values = 0, 1
> shutter:flag_meanings = "closed open"
> shutter:standard_name = "status_flag"
>
> This variable is just indicating the position of the shutter. There is no ambiguity with it's use. If a user wants to use the data for atmospheric reasons they should filter to only use data where profiling. In fact we can implement this variable into our code by only using data where shutter is set to open.
>
> Here is an example of more subjective quality variable.
>
> short quality_variable (time)
> quality_variable:long_name = "Quality variable for linked data variable"
> quality_variable:units = '1'
> quality_variable:flag_masks= 1, 2, 4, 8, 16, 32
> quality_variable:flag_meanings = "Shutter_not_open
> Laser_below_80_percent_power
> Laser_below_60_percent_power
> Laser_below_40_percent_power
> Bird_poop_may_be_on_sensor
> Bird_poop_is_on_sensor"
> quality_variable:flag_meanings = "Bad Suspect Suspect Bad Suspect Bad"
> quality_variable:standard_name = "quality_flag"
>
> In this example there are three indications when the laser is less than 100%. It would be up to the user to decide what percentage is the limit where they do not want to use the data. This is more subjective and dependent on the research techniques to determine if the issue a problem or not. It is also up to the user to determine if the chance of bird poop on the sensor is an issue or if they are OK with the risk of using the data. And to be nice to the user we have also pulled in information from the shutter variable so the user can decided to only use the quality_variable instead of using both shutter and quality_variable. This is up to the data provider to decide. Some providers see the state of the shutter as quality information, some would not. There is no requirements put on the quality variable as to how it is used. It is just a quality information variable following the same rules as a CF state variable.
>
> I have also included an attribute that I am not currently proposing called flag_assessment. This is a subjective statement from the data provider on their opinion of the quality of the data. A user can search for the word "Bad" and then exclude only that data from analysis where the mask is set. This would take all the guess work of quality away from the user if they decided to take the opinion of the data provider. I'm not currently proposing the addition of flag_meanings, this is just an example of how quality can be expanded to be more simple for a user but not take away the user's ability to make their own decision. Everyone has strong opinions on quality of data.
>
> Thanks,
>
> Ken
>
> On 2019-7-23 06:50, Martin Juckes - UKRI STFC wrote:
>
> Dear Ken,
>
>
> thanks for your response to me below.
>
>
> Would it be fair to suggest that "status" should, as far as possible, reflect a generic objective classification, with terms such as "sensor_nonfunctional" which have a comparable meaning for all datasets, while "quality" is a subjective *measure* with a meaning that may from dataset to dataset? E.g. if dataset A has a maximum "quality" of 11 and dataset B only goes up to 10, it doesn't necessarily imply that dataset A is in any sense better and B.
>
>
> If you want to use it in weighted means, perhaps it should be "quality_measure" rather than "quality_flag"? With "status_flag" the order of integer values does not have any meaning, but with quality perhaps it would make more sense have some concept of a sequence of quality settings (so that, for example "1" always indicates a quality between "0" and "2" within a dataset, but could have different meanings in different datasets). Could the quality also be expressed as a floating point number without any flag meanings?
>
>
> Responding to a point Barna raised: it is certainly possible to have more than one "status_flag" variable, but I don't think it is ideal: if information needs to be split across multiple variables we generally like to describe the difference between the variables in the standard name or in other metadata. In this case, I think there is a good case for using a new standard name.
>
>
> regards,
>
> Martin
>
>
>
>
> ________________________________
> From: CF-metadata <cf-metadata-bounces at cgd.ucar.edu> on behalf of Andrew Barna <abarna at ucsd.edu>
> Sent: 23 July 2019 00:23
> To: Kehoe, Kenneth E.
> Cc: cf-metadata at cgd.ucar.edu
> Subject: Re: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables
>
> Ken,
>
> I guess, I don't see this proposed change as necessary since the
> distinction between the terms "quality" and "status" is really done in
> the "flag_meanings" attribute and is basically free form/uncontrolled.
> These attributes need to be used by this new name as well.
>
> Let me rephrase my suggestion/question:
> If this proposal is not adopted, but an example of how to use a
> variable, with the standard name of "status_flag", to only indicate
> data quality is included in the document, would that help?
>
> -Barna
>
> On Mon, Jul 22, 2019 at 1:22 PM Kehoe, Kenneth E. <kkehoe at ou.edu> wrote:
>
> Barna,
>
> Yes an update to the CF document should follow after the new
> standard_name is implemented. I think multiple examples are needed since
> status_flag covers many different types of state variables.
>
> Ken
>
>
>
> On 2019-7-22 10:35, Andrew Barna wrote:
>
> Hi Martin, Ken,
>
> Is there anything wrong with including multiple "status_flag"
> variables to capture all separate state you wish? The CF document
> unfortunately only includes an example of how to encode the status of
> a sensor, but the actual meanings of the flag values are entirely up
> to you, and this will not change with this proposal. Perhaps the CF
> document would benefit from additional examples (e.g. one that only
> shows data quality flags).
>
> -Barna
>
>
> On Mon, Jul 22, 2019 at 9:04 AM Kehoe, Kenneth E. <kkehoe at ou.edu> wrote:
>
> Hi Martin,
>
> I see status encompassing multiple metadata pieces of information. For
> example it could be a state of the instrument as it cycles through a
> pre-programed routine (Look at calibration target, look at sky, look at
> ground, look at second calibration target, repeat...). Or the sources of
> the inputs for a model where the availability or some other reason could
> require making a decision on what source(s) to use. For provenance this
> source information is important to report on a time step basis. Or the
> status could be a data providers method to provide uncertainty
> information (I see this as incorrect but some people do see it this
> way). Each of these are important metadata but the method of use is
> different than a strictly quality variable. A quality variable provides
> information indicating if the data should be used or possibly could be
> used in a weighted mean method to favor high quality data over low
> quality data. The way the metadata is used is different depending on the
> metadata type. A state of the instrument would be used for sub-setting
> calibration vs. data. There is no ambiguity in this as data from a
> calibration target is not used in a weather research analysis. But
> quality is more subjective and is decided by the data user. If the
> quality variable has 20 different quality tests the user would need to
> decided if all 20 test results should be used or only a subset. Also,
> the code for applying the quality is different than the state of the
> instrument view (in my example above).
>
> It is possible to have a quality test result from the state of the
> instrument, but not the other way around (typically). So I need a way to
> distinguish the two for automated or semi-automated tools. Hence my
> point of quality_flag essentially being a subset of status_flag
>
> Ken
>
>
>
> On 2019-7-22 02:57, Martin Juckes - UKRI STFC wrote:
>
> Dear Ken,
>
>
> Can you expand on the distinction between "quality" and "status"? I understand that they are different in principle, but, in order to support this new standard name I think we need a clear objective statement of how we would want to distinguish between them in CF.
>
> The conventions section on flags (3.5) mixes the two up (http://cfconventions.org/cf-conventions/cf-conventions.html#flags ), so some re-wording of the document would also be needed,
>
> regards,
> Martin
>
> ________________________________
> From: CF-metadata <cf-metadata-bounces at cgd.ucar.edu> on behalf of Kehoe, Kenneth E. <kkehoe at ou.edu>
> Sent: 19 July 2019 06:42
> To: cf-metadata at cgd.ucar.edu
> Subject: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables
>
> Dear CF,
>
> I am proposing a new standard name of "quality_flag" to indicate a variable is purely a quality control variable. A quality control variable would use flag_values or flag_masks along with flag_meanings to allow declaring levels of quality or results from quality indicating tests of the data variable. This variable be a subset of the more general "status_flag" standard name. Currently the definition of "status_flag" is:
>
> - A variable with the standard name of status_flag contains an indication of quality or other status of another data variable. The linkage between the data variable and the variable with the standard_name of status_flag is achieved using the ancillary_variables attribute.
>
> This definition includes a variable used to define the state or other status information of a variable and can not be distinguished by standard name alone from a state of the instrument, processing decision, source information, needed metadata about the data variable or other ancillary variable type. Since there is no other way to define a purely quality control variable, the use of "status_flag" is too general for strictly quality control variables. By having a method to define a variable as strictly quality control the results of quality control tests can be applied to the data with a software tool based on requests by the user. This would not affect current datasets that do use "status_flag" nor require a change to the definition outside of the indication that "quality_flag" standard name is available and a better use for pure quality control variables.
>
> Proposed addition:
>
> quality_flag = A variable with the standard name of quality_flag contains an indication of quality information of another data variable. The linkage between the data variable and the variable or variables with the standard_name of quality_flag is achieved using the ancillary_variables attribute.
>
> Proposed change:
>
> status_flag = A variable with the standard name of status_flag contains an indication of status of another data variable. The linkage between the data variable and the variable with the standard_name of status_flag is achieved using the ancillary_variables attribute. For data quality information use quality_flag.
>
> Thanks,
>
> Ken
>
>
>
> --
> Kenneth E. Kehoe
> Research Associate - University of Oklahoma
> Cooperative Institute for Mesoscale Meteorological Studies
> ARM Climate Research Facility - Data Quality Office
> e-mail: kkehoe at ou.edu<mailto:kkehoe at ou.edu> | Office: 303-497-4754
>
> --
> Kenneth E. Kehoe
> Research Associate - University of Oklahoma
> Cooperative Institute for Mesoscale Meteorological Studies
> ARM Climate Research Facility - Data Quality Office
> e-mail: kkehoe at ou.edu | Office: 303-497-4754
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
> --
> Kenneth E. Kehoe
> Research Associate - University of Oklahoma
> Cooperative Institute for Mesoscale Meteorological Studies
> ARM Climate Research Facility - Data Quality Office
> e-mail: kkehoe at ou.edu | Office: 303-497-4754
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
> --
> Kenneth E. Kehoe
> Research Associate - University of Oklahoma
> Cooperative Institute for Mesoscale Meteorological Studies
> ARM Climate Research Facility - Data Quality Office
> e-mail: kkehoe at ou.edu | Office: 303-497-4754
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system.
UK Research and Innovation has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UK Research and Innovation does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Opinions, conclusions or other information in this message and attachments that are not related directly to UK Research and Innovation business are solely those of the author and do not represent the views of UK Research and Innovation.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20190724/26e8ba40/attachment-0001.html>
Received on Wed Jul 24 2019 - 02:01:53 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:43 BST

⇐ ⇒