[CF-metadata] CF upgrade to netCDF variable names from Russ Rew on 2014-01-15 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Russ Rew <russ>
Date: Wed, 15 Jan 2014 14:34:22 -0700

John Graybeal wrote:
> This prompts me to observe that somehow, in this brave new age of computer programming, people are
> developing netCDF software that supports Unicode characters -- Unicode!! -- in variable (attribute
> etc) names. There will be netCDF files in the wild, used by scientists and normal people (especially
> normal people from non-English-speaking countries) that use all sorts of wild and crazy characters
> in their variable names. (Perhaps CF thinks these are "alphanumeric", in which case I've found a
> solution! The standard certainly is not explicitly ASCII-only.) By the way, I was amazed to learn
> that using Unicode in programming languages is starting to take hold.

Yes, since June 2008 we have supported use of Unicode characters in
names in both netCDF-3 and netCDF-4 software. The intent was to make
netCDF more suitable for international use, rather than to encode
mathematical operations in variable names. But we were also responding
to needs of some user communities, for example atmospheric chemists who
wanted to be able to use standard notations for chemical species in
variable names.

Here's a small non-sensical example of ncdump output for a file
containing Unicode names:

  http://www.unidata.ucar.edu/netcdf/workshops/most-recent/utilities/Unicode.html

The precise rules for netCDF names are in the format documentation, but
the short version is:

  ... The first character of a name must be alphanumeric, a multi-byte
  UTF-8 character, or '_' (reserved for special names with meaning to
  implementations, such as the ?_FillValue? attribute). Subsequent
  characters may also include printing special characters, except for
  '/' which is not allowed in names. Names that have trailing space
  characters are also not permitted.

That document also warns:

  Note that by using special characters in names, you may make your data
  not compliant with conventions that have more stringent requirements
  on valid names for netCDF components, for example the CF Conventions.

> At some point, we in the CF-supporting community are going to have to support the standard practices
> in this aspect that are going on everywhere else in the software world, or decide we want a
> permanent back-water for the 'scientists who are not interested in or capable of supporting these
> practices' (not my claim).
>
> Perhaps there are some reasons to want less-restrictive variable names -- I'm not always
> that imaginative, but if so, then present them.
>
> Let's just make the list so far, to get everyone up to speed with the discussion:
> * easier visual parsing (taste, yes, but practical also if you work with lots of data sets from
> different communities)
> * embedding semantic meaning (taste)
> * clearly isolating the context (namespace, hierarchy)
> * matching attribute names that come from the source data
> * consistency with netCDF usage/files -> easier onboarding of those files
> * Unicode/internationalization support

--Russ
Received on Wed Jan 15 2014 - 14:34:22 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST