Opened 14 years ago

Closed 12 years ago

#11 closed enhancement (duplicate)

A standard for CF variable names ("short names") should be added.

Reported by: balaji Owned by: cf-conventions@…
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: Cc:

Description (last modified by halliday1)

1. Title

A standard for CF variable names ("short names") should be added.

2. Moderator

  1. Balaji

3. Requirement

In any of the data formats associated with CF (e.g netCDF) the variable name is its most convenient handle. Most command-line and script-based data tools use this handle. The NCO utilities for manipulating netCDF data, visualization and analysis tools such as ferret or grads or Matlab, all use the variable name as a handle.

This name is not currently included in the standard, though some de facto standards exist. In the absence of a standard, users cannot write scripts or methods that are general enough to apply across datasets from many sources.

For instance, a typical user request from GFDL is to make our internal variables names match what PCMDI (via the CMOR program) required for IPCC AR4: this would enable users who have developed new scientific analyses on their own data to apply them instantly to any model in the IPCC archive.

The CF standard_name attribute does not satisfy this need. Shortcomings include:

  • the standard_name is too long to type.
  • the standard_name attribute along does not uniquely identify a single variable in a file (example: "high", "middle" and "low" cloud amount all have the standard name cloud_area_fraction_in_atmosphere_layer, and are disambiguated using other attributes (height limits of associated layer).

4. Initial Statement of Technical Proposal

We propose a list of "CF short names".

  • Names should be short (6 character max) and human-typable.
  • Names should uniquely identify a physical variable.
  • Proposals to the standard_name list should also list a recommended short name.
  • It is desirable that names be vaguely mnemonic: e.g "slp" for sea level pressure. But this should not be a reason to have long debates over the short name.
  • The "PCMDI short name" used by IPCC AR4 is a good starting point for climate variables. Perhaps someone can suggest an equally useful starting point for weather (e.g based on ERA-40 or NCEP reanalysis datasets).

5. Benefits

Benefits and use cases are coverd in some slides prepared for GO-ESSP are attached here, and also available from Balaji's home page.

6. Status Quo

Without changing the standard, we might get by with de facto standards such as the AR4 short names; but the benefit is limited as many of us participate in multiple international modleing campaigns. There is no guarantee that all such campaigns remain consistent in their use of short names.

Attachments (1)

standard_names.pdf (117.0 KB) - added by balaji 14 years ago.

Download all attachments as: .zip

Change History (7)

Changed 14 years ago by balaji

comment:1 Changed 14 years ago by halliday1

  • Description modified (diff)

Edited moderator's name to display properly.

comment:2 in reply to: ↑ description ; follow-up: Changed 14 years ago by ngalbraith

Replying to balaji:

#11: A standard for CF variable names ("short names") should be added.

My concern with this proposal is that there are CF users who already have selected different short names; the OceanSITES implementation of CF, for example, uses the short names from another standard (the name of which escapes me, sorry).

If this change is implemented, a way would need to be found to make it backward compatible, so that datasets without CF standard short names could be accommodated. One possibility would be to use "cf_name" instead of "name" and allow one or both of these fields to be used.

In the absence of a standard, users cannot write scripts or methods that are general enough to apply across datasets from many sources.

Is there a reason why scripts can't generate standard names from short ones, as long as the short names come from SOME standard and a conversion is available? This seems like a reasonable place to do the transformation between varying short names and unvarying standard names. Our thought in setting up the OceanSITES implementation this way was that the human interface, whatever it might be, would take a user-provided standard short name and our software would provide the CF standard_name. Since we haven't yet built this interface, I can't say how well it will work.

comment:3 in reply to: ↑ 2 Changed 14 years ago by tomgross

I cannot see how adding an additional list of names to be tested against local names can be useful. The first time we write a netcdf reader we usually do rely upon local names to avoid the tedious switch case construct. The second time you write a Netcdf reader you sequentially open variables, read their standard name and then assign the contents to your local name. The whole idea of the standard_name is that you do that testing and then assign your ambiguous (six characters) name to the variable which is described by the standard_name. In other words (to paraphrase) "there is no reason why scripts can't generate short names from standard ones, since the standard names do come from CF standards."

To me short names are only as meaningful as the names of variables used by subroutines, they are purposefully changeable and have no scope meaning outside their narrow application.

Tom Gross

comment:4 Changed 14 years ago by jonathan

I don't like the idea of adding a convention whose purpose is to duplicate the information in other attributes, as redundancy is always an opportunity for inconsistency. Furthermore, existing files do not follow this convention, and since it is optional its presence could not be relied on by software. I'm not clear whether the proposal is for a new attribute or to standardise the variable names, but the same comments apply. If the proposal is for standard variable names, that has the additional drawback that variable names are an aspect of a file format. In principle CF-netCDF data might be translated into files that don't have variable names.

I feel that defining an additional controlled vocabulary for use by programs examining the data (such as proposed by Frank Toussaint et al. on the email list) is a more flexible approach, as it could apply to all existing files, and could be modified without affecting the validity of files. It is especially attractive to give names to "bundles" of metadata, such as the combination of standard_name="surface_air_temperature" and a size-1 coordinate of "height" having a value in the range 1.5-2.0 m. This is a specification of surface air temperature, what PCMDI call "tas", which would be a handy name for this bundle.

I generally think it's better to put intelligence into programs which read files, since files often last for a long time, while programs can be changed. It doesn't sound too hard to me to write a program to examine a file in such terms. It would also be quite easy in many languages to write something that would match abbreviations, or regular expressions, against the standard_names in the file, for the convenience of the analyst.

Jonathan

comment:5 Changed 14 years ago by lowry

Thought I'd bet get trac-registered and answer a couple of comments to this thread....

I totally agree with Jonathan that Frank's proposed external controlled vocabulary is the best mechanism for allowing CF parameters to be referenced by abbreviated names. As this is the basis for an ontology - i.e. a network of terms and their relationships - it can also be used to address Nan's concerns by mapping in terms from other standards and provides the potential for her suggestion of dynamic generation of standardised abbreviations from Standard Names.

I don't think I agree with Tom. There is a definite need for short names to fulfil functions like labelling diagrams, particularly as some Standard Names are getting more like Standard Essays. If these short names aren't standardised in some way we will end up with standardised interoperable data being degraded into heterogenous data products with many different labels describing the same thing.

Roy.

comment:6 Changed 12 years ago by taylor13

  • Resolution set to duplicate
  • Status changed from new to closed

As noted by Balaji and others (offline), this ticket has been superceded by the discussion in ticket 24 (common concept), so I am closing it.

Note: See TracTickets for help on using tickets.