Opened 5 years ago

Last modified 5 years ago

#151 new enhancement

Clarification of use of standard region names in "region" variables.

Reported by: martin.juckes Owned by: cf-conventions@…
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: Cc:

Description (last modified by martin.juckes)

The CF standard name region has the current description "A variable with the standard name of region contains strings which indicate geographical regions. These strings must be chosen from the standard region list." This description implies that the variable should be of character type, but it is often more convenient to have an integer variable and make a clear link to the region names using flag_values and flag_meanings. The proposal is to clarify the definition so that either usage is acceptable and include an example of the latter usage in the convention text. It is also proposed that an appendix be added to the CF Convention text to state clearly any constraints on file meta-data which are implied by the CF Standard Name definitions, so that it is possible to test such constraints in the CF checker.

New descriptions for CF standard names

region

A variable with the standard_name of region contains either strings which indicate a geographical region or flags which can be translated to strings using flag_values and flag_meanings attributes. These strings are standardised. Values must be taken from the CF standard region list.

area_type

A variable with the standard_name of area_type contains either strings which indicate the nature of the surface e.g. land, sea, sea_ice, or flags which can be translated to strings using flag_values and flag_meanings attributes. These strings are standardised. Values must be taken from the area_type table.

New usage example in CF Convention text

The following should be placed after example 3.3, renumbering examples 3.4 and 3.5 to 3.5 and 3.6 respectively (text checked for cross-references to "3.4" and "3.5" and none found)

A variable with standard name of region, area_type or any other standard name which requires string-valued values from a defined list may use flags together with flag_values and flag_meanings attributes to record the translation to the string values. The following example illustrates this using integer flag values for a variable with standard name region and flag_values selected from the standardized region names (see section 6.1.1):

Example 3.4. Flag variable with controlled values, using flag_values

int basin(lat, lon);
       standard_name: region;
       flag_values: 1, 2, 3;
       flag_meanings:"atlantic_arctic_ocean indo_pacific_ocean global_ocean";
......
values::
   basin: 1, 1, 1, 1, 2, ..... 

Hook in section 3.3 to highlight existence of constraints implied by standard name

In section 3.3, append to the paragraph about "description", after "The description is meant to clarify the qualifiers of the fundamental quantities such as which surface a quantity is defined on or what the flux sign conventions are. We don"t attempt to provide precise definitions of fundumental physical quantities (e.g., temperature) which may be found in the literature." the following:

The description may define rules on the variable type, attributes and coordinates which must be complied with by any variable carrying that standard name (such as in example 3.4).

Note that also that there is a typo in the text cited aboove don"t instead of don't

Change History (14)

comment:1 Changed 5 years ago by jonathan

Dear Martin

Thanks for making this proposal. As you know I agree with the principle. I'd like to generalise it to other similar cases, so I would suggest modifying your text

A variable with standard name of region may also be of integer type and use flag_values and flag_meanings attributes to express the relationship between the integers and the region names:

to

A variable with standard name of region, area_type or any other standard name which requires string-valued values from a defined list may alternatively be of integer type and use flag_values and flag_meanings attributes to record the translation between the integers and the string values, for instance:

and then give your example as it is. (I think "translate" is more explicit than "relationship" but you may disagree!) This also requires a modified definition for area_type:

A variable with the standard name of area_type contains strings which indicate the nature of the surface e.g. land, sea, sea_ice, or integers which can be translated to strings using flag_values and flag_meanings attributes. These strings are standardised. Values must be taken from the area_type table.

I'm not convinced about modifying Appendix B. I feel that it should be adequate to note the constraints for specific standard names in the table itself. We could also make a note about the existence of constraints on the standard name page. If we were to make a separate list of them, it should be comprehensive. For instance, there are a number which expect or require particular coordinates variables to exist.

Best wishes

Jonathan

comment:2 Changed 5 years ago by martin.juckes

  • Description modified (diff)

comment:3 Changed 5 years ago by martin.juckes

Dear Jonathan,

thanks .. I've added your generalisation and reworded the suggested decsription for region to match your wording for area_type.

I've also modified basin in the example to be a lat/lon field, following a comment from Karl: in CMIP5 and CMIP6 basin(basin) is a character array used as a dimension, while basin(lat, lon) is an integer array. Aligning the example cleanly with CMIP usage should make it clearer.

On the suggested Appendix: this could be separated off, as the other modifications don't rely on it and, as you say, it would make sense to make a complete list of relevant rules before adding it. I included it because I have the impression that rules which are only recorded in CF standard name descriptions are not picked up in the conformance document or the checker. The suggested Appendix may not be the best way of addressing this problem, but I think it is worth having a paragraph in the convention text about constraints which are expressed in the standard name descriptions. It may be enough to ensure that there are explicit examples for each type of constraint (such as the one proposed above) with relevant standard names listed. A sentence could also be appended to the paragraph about description in section 3.3: The description may define rules on the variable type and attributes (see for example section 6.1.1) which must be complied with by any variable carrying that standard name.

Regards, Martin

comment:4 Changed 5 years ago by jonathan

Dear Martin

Thanks very much. Seeing the change you have made for consistency with CMIP, I realise that this new text is probably not in the right place in the document. Sorry I didn't realise this before. Sect 6 is about coordinates. When basin is an auxiliary coordinate variable, we don't need the flag methods; there is a single dimension with basin names as labels. The example and your concern is about the case when a data variable contains regions or area_types. Therefore I would now suggest that the new text and the example should be at the end of Sect 5.5 instead, or should form a new short Sect 8.3 about string-valued data variables (since this mechanism is a kind of packing), or maybe there's a better place for them - but probably not in Sect 6. What do you think?

I appreciate your point about checking of constraints on data variables with particular standard names. I agree it would be good to note this in Sect 3.3, and a corresponding sentence could be inserted in the conformance document for Sect 3.3. I think that would be a better way than splitting Appendix B. I don't know actually what the cf-checker currently does about this or what it could do, but it would be useful to make the point explicitly.

Best wishes

Jonathan

comment:5 Changed 5 years ago by martin.juckes

  • Description modified (diff)

Dear Jonathan,

In the draft of CF-1.7 section 6 is "Labels and Alternative Coordinates" and 6.1 is "Labels", which looks suitable to me. Example 6.2 has a region variable as a coordinate, but the text is about how to encode geographical regions in a variable. I can't see how this fits into section 5 ("Coordinates"). Am I missing something here?

On the cf-checker: a NetCDF file with a variable:

float basin(index):
  standard_name: region

is passed by the checker as valid, with a warning for the absence of a units attribute on the variable. If the variable is defined as in the example above and invalid region names are used, this is also passed (I've updated the example to change flag_values from a string, which the checker does not allow, to a list of integers). So these details are not currently checked.

regards. Martin

comment:6 Changed 5 years ago by jonathan

Dear Martin

Sorry, I meant Sect 3.5 (not 5.5). Sect 3.5 is where the flag attributes are defined. This would be a logical place to mention their use to encode string-valued variables. Alternatively, if it's regarded as a kind of data compression, it could be in Sect 8. However, Sect 6.1.1 is about auxiliary coordinate variables, whereas the CMIP-motivated example basin(lat,lon) is a data variable, not a coordinate variable, so doesn't belong in Sect 6, I think.

Best wishes

Jonathan

comment:7 Changed 5 years ago by martin.juckes

Dear Jonathan,

thanks, I understand now. Of those options I'd prefer 3.5. Although there is an element of compression I think there is also an element of convenience: integer arrays are easier to deal with in many languages than string arrays. Section 3.5 does look like a natural home. I would put it immediately under example 3.3, as a new example 3.4 and renumber existing ex. 3.4 onwards .. do you agree?

This would require a slight change in the wording above (along the lines "If the variable has a standard_name which requires values from a specified list, the flag_values and flag_meanings attributes can be used ..."). If we agree on the positioning in the text, I'll go ahead and rewrite it,

regards, Martin

comment:8 Changed 5 years ago by jonathan

Dear Martin

Yes, I agree, that would be just right. Thank you very much.

Jonathan

comment:9 Changed 5 years ago by martin.juckes

  • Description modified (diff)

comment:10 Changed 5 years ago by martin.juckes

  • Description modified (diff)

Dear Jonathan,

I have made those changes. I also amended the new standard name definition to specify use of strings or flags (instead of strings or integers), as it appears sensible to allow the use of flags of type byte, as done in the other examples in section 3.5. I've put in a cross-reference to section 6.

comment:11 Changed 5 years ago by jonathan

Dear Martin

That looks good - many thanks. One minor suggestion, that "The description may define rules on the variable type and attributes" should be "The description may define rules on the variable type, attributes and coordinates" since there are several cases when particular coordinate variables are required.

Please could others express support for or otherwise comment on this ticket.

Cheers

Jonathan

comment:12 Changed 5 years ago by martin.juckes

  • Description modified (diff)

Dear Jonathan,

Thanks, I've made the change you suggested ('type and attributes' --> 'type, attributes and coordinates'),

Martin

comment:13 Changed 5 years ago by davidhassell

Hello,

I support this ticket. I have a couple of minor comments:

  • I'm not sure why example 3.4 is mentioned in the new text to 3.3: the basin variable looks like a data variable to me (although it could fulfil a coordinate role, of course).
  • In section 3.3, should it be mentioned that variable type need not be adhered to in the cases to which this ticket relates?

All the best, David

comment:14 Changed 5 years ago by martin.juckes

  • Description modified (diff)

Hi David,

The first point is, I think, a misunderstanding of what I was trying to say. I've moved the bracketted phrase to the end of sentence, which I hope makes it clearer that it refers to the sentence as a whole, not specifically to "coordinates".

On the 2nd bullet: I've changed the text of standard name definition to clarify that the type restricton does not apply to flags.

cheers, Martin

Note: See TracTickets for help on using tickets.