⇐ ⇒

[CF-metadata] New standard_name of quality_flag for corresponding quality control variables

From: Nan Galbraith <ngalbraith>
Date: Thu, 5 Sep 2019 13:14:05 -0400

Thanks, Alison and Ken -

This is a good addition - well done.? Your proposed new wording looks
just right, at least to me.

Cheers - Nan

On 9/4/19 4:35 AM, Alison Pamment - UKRI STFC wrote:
> Dear Ken,
>
> Apologies for the delay in looking at this conversation - I have recently returned from leave. Thank you for the proposal and to all those who commented in the discussion.
>
> To summarize the discussion, I think we can draw out the following points.
>
> 1) A variable with the existing standard name of status_flag can be used to provide metadata about the instrument that produced the data (certainly that is consistent with all three examples in section 3.5 of the CF document).
>
> 2) The current definition of status_flag has caused some confusion, perhaps because it is too vague. It is important that any changes to the definition should not affect the interpretation of existing data. This would mean that status_flag could still be used to label variables that contain both quality control and instrument state information as in the WOCE example given by Barna.
>
> 3) It is not unusual to have standard names for both generic and more specific definitions of a particular quantity. For example, we have around 20 X_area_fraction names for specific surface types such as ice, snow, land, etc., but also a generic name of area_fraction which requires a coordinate variable of area_type. Another example would be salinity names - we have the generic name sea_water_salinity and six further names for salinity calculated or measured according to specific definitions. The definitions of the generic names make reference to the more specific terms so that data providers can make an informed choice when deciding how to label a particular variable. Data users can search for the generic name, specific names, or both. Introducing a new term quality_flag as a more specific version of status_flag would follow the same principle.
>
> 4) There is strong support for introducing the new name of quality_flag.
>
> 5) There is a need to improve section 3.5 of the CF convention document - it should include at least one example of the use of quality_flag. Ken has offered to propose the necessary changes.
>
> 6) There is scope for further discussion on how to include detailed descriptions of quality control procedures in the netCDF file, or point to external sources of information.
>
> Regarding the names themselves, Ken proposed the following:
>
> quality_flag (canonical unit: 1)
> 'A variable with the standard name of quality_flag contains an indication of assessed quality information of another data variable. The linkage between the data variable and the variable or variables with the standard_name of quality_flag is achieved using the ancillary_variables attribute.'
>
> This proposal for the new name looks fine and its definition is consistent with the preceding discussion. This name is accepted for publication in the standard name table and will be added in the September update (planned for the 17th).
>
> For the existing name of status_flag it is proposed to add an extra sentence in the definition to reference the new name:
> status_flag (canonical unit: 1)
> 'A variable with the standard name of status_flag contains an indication of quality or other status of another data variable. The linkage between the data variable and the variable with the standard name of status_flag is achieved using the ancillary_variables attribute. A variable which contains purely quality information may use the standard name of quality_flag to provide an assessed quality of the corresponding data.'
>
> I suggest that we should also use this opportunity to clarify that the status may be that of the instrument producing the data. This would address points raised by John and Martin earlier in the discussion and would make a clearer connection between the standard name definition and the existing examples in section 3.5 of the conventions.
>
> 'A variable with the standard name of status_flag contains an indication of quality or other status of another data variable. This may include the status of the instrument producing the data as well as data quality information. The linkage between the data variable and the variable with the standard name of status_flag is achieved using the ancillary_variables attribute. A variable which contains purely quality information may use the standard name of quality_flag to provide an assessed quality of the corresponding data.'
>
> Does that sound okay?
>
> Best wishes,
> Alison
>
> From: CF-metadata <cf-metadata-bounces at cgd.ucar.edu> On Behalf Of Kehoe, Kenneth E.
> Sent: 30 August 2019 18:10
> To: cf-metadata at cgd.ucar.edu
> Subject: Re: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables
>
> CF-Metadata list,
>
> I have not seen disagreement since Barna's reply on 2019-07-26. Does this mean the new standard name is accepted and I can start using it? I believe we use 30 days of no reply for the method to take a "vote" for acceptance?
>
> Thanks,
>
> Ken
>
>
> On 2019-8-19 17:03, Kehoe, Kenneth E. wrote:
> Where do we stand with this request? Will it be accepted?
>
> Thanks,
>
> Ken
>
>
> On 2019-7-29 08:47, Jim Biard wrote:
> Hi.
>
> I agree that there is room for a means to more cleanly separate these two categories of information, even when there may be some redundancy/overlap between them. I have also wished for such an option.
>
> Grace and peace,
>
> Jim
>
> On 7/27/19 3:12 PM, Lowry, Roy K. wrote:
> Dear Ken,
>
> Having been involved in the quite painful process of weaning out data quality information from the host of status flag (often misnamed quality flag) schemes in oceanographic legacy data I would be very disappointed were 'quality_flag' not to be accepted as a Standard Name. If nothing else it will provide best-practice guidance to maintain semantic purity in quality flag schemes.
>
> Cheers, Roy.
>
> I have now retired but will continue to be active through an Emeritus Fellowship using this e-mail address.
>
> ________________________________________
> From: CF-metadata on behalf of Kehoe, Kenneth E.
> Sent: 26 July 2019 22:17
> To: CF Metadata List
> Subject: Re: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables
> Barna,
>
> I disagree. It is not possible to distinguish a quality variable by the existence of flag_meanings attribute alone. flag_meanings is an attribute used by any state variable. We need some method to distinguish a state of the instrument or some other general state variable from quality. I see not defining a quality variable explicitly will be more work for the user as they will be required to parse out every flag_meanings value to see if it applies.
>
> I am proposing a standard name because that has a more likely adoption than adding a new attribute. I have tried adding new attributes to the CF convention in the past and I have gotten large push back. Most often I was told to put that information into standard name. For example positive direction on a variable (not a vertical coordinate), digital object identifier, orientation of platform variables, and indication of a variable as being uncertainty were all denied. I was told the standard name describes the variable, which is what I am proposing.
>
> I don't see the use of multiple variables for describing quality as a problem. I would recommend only having one, but not forbidding multiple. I know the CF document proposes using flag_values with flag_masks to indicate which mask value to use. I find that logic quite confusing for the average user since the descriptions are all mix together in flag_meanings. If your concern is having multiple ancillary quality variables I suggest adding additional standard names or having the user look for a keyword in the long_name. For example we could propose a suite of new standard names: quality_flag, primary_quality_flag, secondary_quality_flag instrument_quality_flag, model_quality_flag, ...
>
> Since CF does not have any specific examples or explanation on how to handle quality I think we need to start somewhere. The standard name table has many examples of a general term being introduced to work on solving a problem, and when that needs refinement we add a better term. Looking at https://urldefense.proofpoint.com/v2/url?u=http-3A__mmisw.org_cfsn_-23_search_platform&d=DwMD-g&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=Vm7o2ZGxPkkqRuPs8nVMVQ&m=9AHYkOPejBCDabIXEBKSnZbLfOXkWjnZ9I42teSfkX0&s=ZxT1R4rrDMwl-7ZfQrjb38Z9MTbWeGuFaRCfYS5Wc_c&e= there are more general terms that do not indicate positive direction when first entered into the table. Then when there was an understanding that a need for knowing the positive direction was desired it was added later. I see the addition of quality_flag following the same logic. If a refined or set of refined standard names are needed in the future we can add them but right now starting simple with quality_flag seems most appropria
> te.
>
> I have serious doubts I will get a new attribute name adopted by CF to indicate a variable is a quality variable. I could propose a new attribute like "quality_variable" but what do I set the attribute to. There is no CF boolean to say True/False. And if we have the attribute set to some value we will need a new ontology to mange. I'm looking for a simple solution to declare a variable as a quality indication variable.
>
> Thanks and have a nice weekend,
>
> Ken
>
>
>
> On 2019-7-26 14:05, Andrew Barna wrote:
> Hi Everyone,
>
> I've been re reading all these emails and having some long conversations with colleagues about this proposal and still I can't seem to convince myself that it is a good idea.
>
> The initial request seemed to be motivated by wanting to distinguish "quality" from "status" based on standard name alone. This distinction can currently be accomplished by using the "flag_meanings" attribute. This name is hardly unique in needing additional information, many of the radiation names need (often optional) wavelength coordinates. If you are doing any custom calendars or grids, all these need additional attributes or information to properly interpret the data in the variable.
>
> Having multiple flag variables in a file shouldn't be a problem, WOCE did it for "originator" vs "expert" QC. If you really don't want more than one flag variable, the flag_masks attribute allows for combining all these states together, combing that further with flag_meanings even allows you to define which combinations are valid. My group is considering having multiple flag schemes in the file (WOCE and ARGO), so you can just use the one that you like best.
>
> My colleagues expressed concern that this would cause significant confusion to new users who are trying to adopt CF as to which "flag" name to use for their data. And also the added complication of needing to look for more than one name when looking for flag information.
>
> I feel that this issue is best resolved with some clarifying updates to the CF document itself, especially some new examples to show how flags can be used, and not with a new name for this metadata variable.
>
> -Barna
>
>
>
> On Wed, Jul 24, 2019 at 2:20 PM Kehoe, Kenneth E. <mailto:kkehoe at ou.edu> wrote:
> Barna,
>
> I plan to propose some updates to CF document once this name is in the standard name list. It will be a lot easier to have my proposed changed accepted if the standard name is already accepted.
>
> Ken
>
>
> On 2019-7-24 13:21, Andrew Barna wrote:
> I've never personally liked the name "status_flag" and have always interpreted it to be the "CF way" of saying "these values are either an associative array or bit field or some combination of both". It is also a special case of standard names in that two variables with the standard name "status_flag" may not be comparable, a situation which will not change with an added "quality_flag", that is, two variables with the standard name of "quality_flag" also may not be comparable.
>
> Since the actual meaning of the values contained in a variable with the standard name "status_flag" would need to be derived from the various other flag_* attributes, I saw this proposal as an added complication. When looking at a variable with the standard name "quality_flag", I would still would not know the meanings of the values until interpreting the various other flag_* attributes.
>
> I think adding this new name would also require some updates to the CF document itself, section 3.5 and probably Appendix C, to note that there are would now be multiple names which trigger the interpretation of the values as per that section.
>
>
> On Wed, Jul 24, 2019 at 11:40 AM John Graybeal <mailto:jgraybeal at stanford.edu> wrote:
> I support the point about defining 'status' and 'quality'. Yes, there are cases when we define terms that are re-used, but I don't think these terms are reused, they appear only in these flags. Just defining the standard name should do.
>
> Ken, I did like the qualifying text about status_flag but maybe that's because I always thought status_flag could be used that way, as a status of instruments. Looking at the definition (https://urldefense.proofpoint.com/v2/url?u=http-3A__mmisw.org_cfsn_-23_search_status&d=DwMFaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=Vm7o2ZGxPkkqRuPs8nVMVQ&m=VTdz9EFvdYCHVZuWqyGFVznrg66340ZMoqNJYCjC5P8&s=t2KpUQnNrF4eJoiygFuC7iQ1NSsuwhjQXku4Nvu5XdU&e=) it doesn't say that, does it? It's all about the data. I even searched the archives, I was so sure people talked about it in another way, but I can't find any evidence of that.
>
> So I conclude equipment status is not included in the model currently supported by status flag, and we shouldn't try to fix that here. What do you think?
>
> John
>
>
> On Jul 24, 2019, at 10:34 AM, Kehoe, Kenneth E. <mailto:kkehoe at ou.edu> wrote:
>
> Daniel,
>
> Thanks for the information. At some point we should chat about how our two organizations think about and perform quality analysis.
>
> Martin,
>
> I'm confused about your suggestion to include definitions of status and quality. I guess we could define those terms better in the general standard name table, but that is not my intention. My concern is that the definition of those terms is larger than the scope of what I wanted to propose. I would prefer to just work on the definitions of the status_flag and quality_flag.
>
> Looking at your suggestion to numerically order the values suggests I think we have a different notion of how to use quality_flag. A quality_flag is not intend to indicate severity or ranking of tests. It is just a state field. My program had discussions to do something like that in the past and it did not end well.
>
> If we want to add terminology along the lines of "The variable with standard name quality_flag refers to an assessed quality of the corresponding data." that is OK with me. Your expanded definition of status does not help me to better understand status. I think it's the statement of "may" that confuses me. I see a definition needing to be more definitive.
>
> I don't see the addition of quality_flag as changing status_flag. I see quality_flag as a more narrow sub-class of status_flag. I would prefer to not change much with status_flag since it has such a long history with CF.
>
> I think we have these definitions:
>
> status_flag: A variable with the standard name of status_flag contains an indication of quality or other status of another data variable. The linkage between the data variable and the variable with the standard name of status_flag is achieved using the ancillary_variables attribute. A variable which contains purely quality information may use the standard name of quality_flag to provided an assessed quality of the corresponding data.
>
> quality_flag = A variable with the standard name of quality_flag contains an indication of assessed quality information of another data variable. The linkage between the data variable and the variable or variables with the standard_name of quality_flag is achieved using the ancillary_variables attribute.
>
> Thanks,
>
> Ken
>
>
>
>
> On 2019-7-24 03:40, Daniel Neumann wrote:
> Dear Ken, Martin, John, Roy and Barna,
>
> I/we thought about submitting a similar proposal to add some extended model quality information to netCDF files. The suggested description of "quality_flag" and the modified description of "status_flag" fit well into our project.
>
> I am just writing this to show that there are more people in the community who are interested in this.
>
> Cheers,
> Daniel
>
>
> Am 24.07.2019 um 10:49 schrieb Martin Juckes - UKRI STFC:
>
> Dear John, Roy,
>
>
> OK, I'm happy to drop the line about ordering of quality flags if it doesn't work. This is consistent with Roy's suggested definitions (posted 2 minutes before John's reply), which also drop this sentence, and add a broader description of valid usage of the status flag (I've copied them her to get the discussion back in a single thread):
>
>
> Status: The value of a variable with standard name status_flag may refer to the status of the instrument or process which generated the corresponding data, or it may refer to the data itself. This may include information about data quality, particularly in legacy data sets. 'quality_flag' should be used if data quality is the only type of information contained in the variable.
>
> Quality: The value of a variable with standard name quality_flag refers to an assessed quality of the corresponding data.
>
>
> regards,
>
> Martin
>
> ________________________________
> From: John Graybeal
> Sent: 24 July 2019 09:20
> To: Juckes, Martin (STFC,RAL,RALSP)
> Cc: Andrew Barna; Kehoe, Kenneth E.; CF Metadata List
> Subject: Re: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables
>
> +1 Martin, just what I was thinking also, it creates the opening but does not preclude mixing status and quality flags in a single status_flag, which I think is important.
>
> Um, I don't think you can dictate that "Numeric values of the quality flag should be ordered, such the lowest value corresponds to the poorest quality and the highest value to the best quality." Some people will be documenting their own flags which are whatever they are.
>
> Responding to an earlier possible misconception, I want to emphasize (read: confirm) these are the standard names, which are used to characterize the attributes. They are not the variable names, so you can have multiple different variables that express different status_flags or different quality_flags.
>
> John
>
> On Jul 24, 2019, at 12:46 AM, Martin Juckes - UKRI STFC <mailto:martin.juckes at stfc.ac.uk> wrote:
>
> Dear Ken, Barna,
>
>
> I agree that we should keep things simple as far as possible, but I still think we need to say something about the difference between "status" and "quality". The proposed definitions do not, as far as I can see, say anything about this. This could lead to confusion, as different data providers may make different choices, so that user software has to check both flags and be prepared for arbitrary usage patterns.
>
>
> Here is an attempt at a simple definitions of the two words, which could be appended to your proposed definitions (significant words used in the standard name table have canned definitions which are added to the definitions of all standard names using those words).
>
>
> status: The value of a variable with standard name status_flag may refer to the status of the instrument or process which generated the corresponding data, or it may refer to the data itself. If the data variable also has a quality_flag, the status_flag should be restricted to properties of the instrument or process.
>
>
> quality: The value of a variable with standard name quality_flag refers to an assessed quality of the corresponding data. Numeric values of the quality flag should be ordered, such the lowest value corresponds to the poorest quality and the highest value to the best quality.
>
>
> I've suggested "assessed" rather than "subjective", because quality could be estimated using an algorithm which some would call objective. I've also added in the idea that "quality" should in some sense be a scale from poorest to best: this is the case for the examples we have discussed, and I think it makes a clear distinction between the two flags. Are there any potential uses of the quality flag which are not consistent with the idea of a quality scale?
>
>
> Specifying that the "status_flag" has a more restricted usage when the "quality_flag" is present may be a way of getting around compatibility issues, allowing people to continue mixed usage of "status_flag". The CF Convention is supposed to apply with the latest standard name table, so people don't have the option of referring to an earlier version of the table, even if they specify an earlier version of the Convention.
>
>
> regards,
>
> Martin
>
> ________________________________
> From: CF-metadata <mailto:cf-metadata-bounces at cgd.ucar.edu> on behalf of Andrew Barna <mailto:abarna at ucsd.edu>
> Sent: 23 July 2019 22:56:19
> To: Kehoe, Kenneth E.
> Cc: mailto:cf-metadata at cgd.ucar.edu
> Subject: Re: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables
>
> Looks good to me.
>
> I took the "subjective" part from how Martin was asking about quality vs status.
>
> -Barna
>
> On Tue, Jul 23, 2019 at 2:40 PM Kehoe, Kenneth E. <mailto:kkehoe at ou.edu> wrote:
> Barna,
>
> OK your definition is fine. I suggest one small change, drop the word subjective.
>
> status_flag: A variable with the standard name of status_flag contains an indication of quality or other status of another data variable. The linkage between the data variable and the variable with the standard_name of status_flag is achieved using the ancillary_variables attribute. A variable which contains purely quality information may use the standard_name of quality_flag.
>
> Ken
>
>
> On 2019-7-23 15:28, Andrew Barna wrote:
> Ken,
>
> I think I'm confused by the text of the proposed change to the definition of status_flag.
>
> In your proposed change the "quality" wording of the status_flag definition was dropped. Here is the first sentence of each:
> Current: A variable with the standard name of status_flag contains an indication of quality or other status of another data variable.
> Proposed: A variable with the standard name of status_flag contains an indication of status of another data variable.
>
> Perhaps the following for "status_flag":
> A variable with the standard name of status_flag contains an indication of quality or other status of another data variable. The linkage between the data variable and the variable with the standard_name of status_flag is achieved using the ancillary_variables attribute. A variable which contains purely subjective quality information may use the standard_name of quality_flag.
>
> That is, keep the current definition, but also inform of a more restrictive option. I don't see any way around not reading the flag_meanings with any of these options.
>
> -Barna
>
>
> On Tue, Jul 23, 2019 at 1:03 PM Kehoe, Kenneth E. <mailto:kkehoe at ou.edu> wrote:
> Barna,
>
> I see this as an optional addition to narrow the standard. It does not
> prohibit someone from using status_flag (as a standard_name or a
> standard_name modifier) from a previous convention version
> implementation nor invalidate that use from a previous convention
> version. In your example the use of status_flag is a mixture of state
> and quality. I see this new name as a way to improve things going
> forward. Since the historical WOCE example uses state and quality with
> some additional rules not listed in the CF standard it would be up to
> the user to understand how to use the variable. Without seeing the WOCE
> data I can't make a specific suggestion.
>
> I don't know about any rules regarding a restriction. I think the
> general concept of CF is to set the minimum rules. Additional rules
> applied by another group on top of CF is allowed. For example my
> organization uses additional attributes not defined in CF. I see
> quality_flag as a narrowing of the rules of status_flag not replace it.
> status_flag can still have a mixture of state and quality if the data
> provider prefers to do it that way. quality_flag can only have quality
> information. The determination of what is quality information is
> actually up to the data provider to decide.
>
> Ken
>
>
>
> On 2019-7-23 13:33, Andrew Barna wrote:
> Ken,
>
> Ok I see how this can be useful. Two more questions:
> * How would you deal with "legacy" flag schemes which mix "status" and
> "quality" already? I'm thinking of WOCE CTD as an example where "7"
> means Despiked (a status) and "3" means Questionable measurement (a
> quality). The way my seagoing group have dealt with both is by having
> the "quality" override "status" if the quality is anything other than
> "good", e.g. a questionable measurement which has been despiked gets
> flag 3.
>
> * Are there rules in CF regarding restricting an existing definition?
> I imagine there are many datasets already using the "status_flag" name
> as either a stand alone standard name or a standard name modifier.
> This change seems to be "breaking" in that previously compliant
> datasets would now have quality information in a purely status field.
>
> Thanks
> -Barna
>
> On Tue, Jul 23, 2019 at 10:08 AM Kehoe, Kenneth E. <mailto:kkehoe at ou.edu> wrote:
> Martin,
>
> Thanks for your reply. I would prefer to keep the proposal simple. My example of a weighted mean was just one I created off the top of my head. I don't see it as something to actually look into implementing.
>
> I need a way to indicate a variable is a quality status field. The distinction that the status field only contains quality information is the important distinction. The variable indicated with quality_flag will need to also use flag_meanings, same as status_flag. Hence my reason for choosing quality_flag to follow a similar naming pattern.
>
> Barna,
>
> Without a distinction that the entire variable is a quality variable the user is forced to parse the flag_meanings to see if the variable applies. This would also encourage a data provider to mix quality with source or instrument state or something else in the same variable. That would be very difficult to understand.
>
> As Martin points out quality is more subjective than other status information. A user may need to choose what parts of the quality variable to apply. I would prefer we not conflate absolute information with subjective information. But we need a way to distinguish the variable contains absolute information vs a variable that contains more subjective information.
>
> To expand on Martin's example imagine a profiling instrument that has a shutter to protect the laser from rain. The laser will always send out pulses and the receiver will always be on receiving the return from laser pulse. To know when the shutter is in the open state where the instrument is profiling we would use a state variable with a simple flag_values method.
>
> short shutter (time)
> shutter:long_name = "Shutter state"
> shutter:units = '1'
> shutter:flag_values = 0, 1
> shutter:flag_meanings = "closed open"
> shutter:standard_name = "status_flag"
>
> This variable is just indicating the position of the shutter. There is no ambiguity with it's use. If a user wants to use the data for atmospheric reasons they should filter to only use data where profiling. In fact we can implement this variable into our code by only using data where shutter is set to open.
>
> Here is an example of more subjective quality variable.
>
> short quality_variable (time)
> quality_variable:long_name = "Quality variable for linked data variable"
> quality_variable:units = '1'
> quality_variable:flag_masks= 1, 2, 4, 8, 16, 32
> quality_variable:flag_meanings = "Shutter_not_open
> Laser_below_80_percent_power
> Laser_below_60_percent_power
> Laser_below_40_percent_power
> Bird_poop_may_be_on_sensor
> Bird_poop_is_on_sensor"
> quality_variable:flag_meanings = "Bad Suspect Suspect Bad Suspect Bad"
> quality_variable:standard_name = "quality_flag"
>
> In this example there are three indications when the laser is less than 100%. It would be up to the user to decide what percentage is the limit where they do not want to use the data. This is more subjective and dependent on the research techniques to determine if the issue a problem or not. It is also up to the user to determine if the chance of bird poop on the sensor is an issue or if they are OK with the risk of using the data. And to be nice to the user we have also pulled in information from the shutter variable so the user can decided to only use the quality_variable instead of using both shutter and quality_variable. This is up to the data provider to decide. Some providers see the state of the shutter as quality information, some would not. There is no requirements put on the quality variable as to how it is used. It is just a quality information variable following the same rules as a CF state variable.
>
> I have also included an attribute that I am not currently proposing called flag_assessment. This is a subjective statement from the data provider on their opinion of the quality of the data. A user can search for the word "Bad" and then exclude only that data from analysis where the mask is set. This would take all the guess work of quality away from the user if they decided to take the opinion of the data provider. I'm not currently proposing the addition of flag_meanings, this is just an example of how quality can be expanded to be more simple for a user but not take away the user's ability to make their own decision. Everyone has strong opinions on quality of data.
>
> Thanks,
>
> Ken
>
> On 2019-7-23 06:50, Martin Juckes - UKRI STFC wrote:
>
> Dear Ken,
>
>
> thanks for your response to me below.
>
>
> Would it be fair to suggest that "status" should, as far as possible, reflect a generic objective classification, with terms such as "sensor_nonfunctional" which have a comparable meaning for all datasets, while "quality" is a subjective *measure* with a meaning that may from dataset to dataset? E.g. if dataset A has a maximum "quality" of 11 and dataset B only goes up to 10, it doesn't necessarily imply that dataset A is in any sense better and B.
>
>
> If you want to use it in weighted means, perhaps it should be "quality_measure" rather than "quality_flag"? With "status_flag" the order of integer values does not have any meaning, but with quality perhaps it would make more sense have some concept of a sequence of quality settings (so that, for example "1" always indicates a quality between "0" and "2" within a dataset, but could have different meanings in different datasets). Could the quality also be expressed as a floating point number without any flag meanings?
>
>
> Responding to a point Barna raised: it is certainly possible to have more than one "status_flag" variable, but I don't think it is ideal: if information needs to be split across multiple variables we generally like to describe the difference between the variables in the standard name or in other metadata. In this case, I think there is a good case for using a new standard name.
>
>
> regards,
>
> Martin
>
>
>
>
> ________________________________
> From: CF-metadata <mailto:cf-metadata-bounces at cgd.ucar.edu> on behalf of Andrew Barna <mailto:abarna at ucsd.edu>
> Sent: 23 July 2019 00:23
> To: Kehoe, Kenneth E.
> Cc: mailto:cf-metadata at cgd.ucar.edu
> Subject: Re: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables
>
> Ken,
>
> I guess, I don't see this proposed change as necessary since the
> distinction between the terms "quality" and "status" is really done in
> the "flag_meanings" attribute and is basically free form/uncontrolled.
> These attributes need to be used by this new name as well.
>
> Let me rephrase my suggestion/question:
> If this proposal is not adopted, but an example of how to use a
> variable, with the standard name of "status_flag", to only indicate
> data quality is included in the document, would that help?
>
> -Barna
>
> On Mon, Jul 22, 2019 at 1:22 PM Kehoe, Kenneth E. <mailto:kkehoe at ou.edu> wrote:
>
> Barna,
>
> Yes an update to the CF document should follow after the new
> standard_name is implemented. I think multiple examples are needed since
> status_flag covers many different types of state variables.
>
> Ken
>
>
>
> On 2019-7-22 10:35, Andrew Barna wrote:
>
> Hi Martin, Ken,
>
> Is there anything wrong with including multiple "status_flag"
> variables to capture all separate state you wish? The CF document
> unfortunately only includes an example of how to encode the status of
> a sensor, but the actual meanings of the flag values are entirely up
> to you, and this will not change with this proposal. Perhaps the CF
> document would benefit from additional examples (e.g. one that only
> shows data quality flags).
>
> -Barna
>
>
> On Mon, Jul 22, 2019 at 9:04 AM Kehoe, Kenneth E. <mailto:kkehoe at ou.edu> wrote:
>
> Hi Martin,
>
> I see status encompassing multiple metadata pieces of information. For
> example it could be a state of the instrument as it cycles through a
> pre-programed routine (Look at calibration target, look at sky, look at
> ground, look at second calibration target, repeat...). Or the sources of
> the inputs for a model where the availability or some other reason could
> require making a decision on what source(s) to use. For provenance this
> source information is important to report on a time step basis. Or the
> status could be a data providers method to provide uncertainty
> information (I see this as incorrect but some people do see it this
> way). Each of these are important metadata but the method of use is
> different than a strictly quality variable. A quality variable provides
> information indicating if the data should be used or possibly could be
> used in a weighted mean method to favor high quality data over low
> quality data. The way the metadata is used is different depending on the
> metadata type. A state of the instrument would be used for sub-setting
> calibration vs. data. There is no ambiguity in this as data from a
> calibration target is not used in a weather research analysis. But
> quality is more subjective and is decided by the data user. If the
> quality variable has 20 different quality tests the user would need to
> decided if all 20 test results should be used or only a subset. Also,
> the code for applying the quality is different than the state of the
> instrument view (in my example above).
>
> It is possible to have a quality test result from the state of the
> instrument, but not the other way around (typically). So I need a way to
> distinguish the two for automated or semi-automated tools. Hence my
> point of quality_flag essentially being a subset of status_flag
>
> Ken
>
>
>
> On 2019-7-22 02:57, Martin Juckes - UKRI STFC wrote:
>
> Dear Ken,
>
>
> Can you expand on the distinction between "quality" and "status"? I understand that they are different in principle, but, in order to support this new standard name I think we need a clear objective statement of how we would want to distinguish between them in CF.
>
> The conventions section on flags (3.5) mixes the two up (https://urldefense.proofpoint.com/v2/url?u=http-3A__cfconventions.org_cf-2Dconventions_cf-2Dconventions.html-23flags&d=DwMFaQ&c=qKdtBuuu6dQK9MsRUVJ2DPXW6oayO8fu4TfEHS8sGNk&r=Vm7o2ZGxPkkqRuPs8nVMVQ&m=VTdz9EFvdYCHVZuWqyGFVznrg66340ZMoqNJYCjC5P8&s=eL0N4oiV_7sTehSUWxRA5r3zJw7U9WyG5Xwvxy8GIzc&e= ), so some re-wording of the document would also be needed,
>
> regards,
> Martin
>
> ________________________________
> From: CF-metadata <mailto:cf-metadata-bounces at cgd.ucar.edu> on behalf of Kehoe, Kenneth E. <mailto:kkehoe at ou.edu>
> Sent: 19 July 2019 06:42
> To: mailto:cf-metadata at cgd.ucar.edu
> Subject: [CF-metadata] New standard_name of quality_flag for corresponding quality control variables
>
> Dear CF,
>
> I am proposing a new standard name of "quality_flag" to indicate a variable is purely a quality control variable. A quality control variable would use flag_values or flag_masks along with flag_meanings to allow declaring levels of quality or results from quality indicating tests of the data variable. This variable be a subset of the more general "status_flag" standard name. Currently the definition of "status_flag" is:
>
> - A variable with the standard name of status_flag contains an indication of quality or other status of another data variable. The linkage between the data variable and the variable with the standard_name of status_flag is achieved using the ancillary_variables attribute.
>
> This definition includes a variable used to define the state or other status information of a variable and can not be distinguished by standard name alone from a state of the instrument, processing decision, source information, needed metadata about the data variable or other ancillary variable type. Since there is no other way to define a purely quality control variable, the use of "status_flag" is too general for strictly quality control variables. By having a method to define a variable as strictly quality control the results of quality control tests can be applied to the data with a software tool based on requests by the user. This would not affect current datasets that do use "status_flag" nor require a change to the definition outside of the indication that "quality_flag" standard name is available and a better use for pure quality control variables.
>
> Proposed addition:
>
> quality_flag = A variable with the standard name of quality_flag contains an indication of quality information of another data variable. The linkage between the data variable and the variable or variables with the standard_name of quality_flag is achieved using the ancillary_variables attribute.
>
> Proposed change:
>
> status_flag = A variable with the standard name of status_flag contains an indication of status of another data variable. The linkage between the data variable and the variable with the standard_name of status_flag is achieved using the ancillary_variables attribute. For data quality information use quality_flag.
>
> Thanks,
>
> Ken
>
>
>


-- 
*******************************************************
* Nan Galbraith        Information Systems Specialist *
* Upper Ocean Processes Group            Mail Stop 29 *
* Woods Hole Oceanographic Institution                *
* Woods Hole, MA 02543                 (508) 289-2444 *
*******************************************************
Received on Thu Sep 05 2019 - 11:14:05 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:43 BST

⇐ ⇒