⇐ ⇒

[CF-metadata] CF upgrade to netCDF variable names

From: Chris Barker <chris.barker>
Date: Wed, 15 Jan 2014 20:07:56 -0800

Let the bike shedding continue!

On Wed, Jan 15, 2014 at 1:14 PM, John Graybeal <jbgraybeal at mindspring.com>wrote:

> I don't think multiple use cases from different individuals and
> communities should be categorized as "no reason other than maybe taste".
> Just sayin'...
>
> multiple use-cases are examples not reasons -- "I'd like to do that", or
"I've been doing that" doesn't give a why... though you do below, thanks.


(and it certainly shouldn't be removed completely -- variable names with
arbitrary bytes in them would really be a mess). Is it ascii-only now? it
probably should stay that way.


This prompts me to observe that somehow, in this brave new age of computer
> programming, people are developing netCDF software that supports Unicode
> characters -- Unicode!! -- in variable (attribute etc) names.
>

I'm a fan of unicode, actually, but despite it being around a long time,
now, it's still a pain in the *&%&^ in C, C++, and, I'm guessing, Fortran.
Not so bad in more modern languages, though apparently some use UTF-16 and
don't always handle the larger code points correctly. So still a pain.

And as you can tell, I'm a fan of restricting names to particular classes
of characters, and unicode includes a lot of concepts that are pretty hard
to define: e.g. "alphanumeric". I can see how it owuld be really nice for
non-english speakers or math and science geeks to use all sorts of great
variable names, but Im afraid opening up fully might more of a nightmae
than it is worth.

My pet programming language, python, currently allows unicode variable
names, with restrictions, but his is a heck of a list to keep track of!

http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html


> There will be netCDF files in the wild, used by scientists and normal
> people (especially normal people from non-English-speaking countries) that
> use all sorts of wild and crazy characters in their variable names.
> (Perhaps CF thinks these are "alphanumeric", in which case I've found a
> solution! The standard certainly is not explicitly ASCII-only.) By the
> way, I was amazed to learn that using Unicode in programming languages is
> starting to take hold.
>

but still only starting....

At some point, we in the CF-supporting community are going to have to
> support the standard practices in this aspect that are going on everywhere
> else in the software world, or decide we want a permanent back-water for
> the 'scientists who are not interested in or capable of supporting these
> practices' (not my claim).
>

I think unicode is a red herring for this issue -- not that it isn't
interesting, but for sure full unicode options would allow nice expressive
variable names, but I'd still rather have variable names that don't look
like math expressions, and aren't legal names in programing languages.

The current CF document says
"Variable, dimension and attribute names should begin with a letter and be
composed of letters, digits, and underscores."

but "letters" is not very well defined when you get outside of ascii -- it
seems we have work to do.


>
>
> Perhaps there are some reasons to want less-restrictive variable names --
> I'm not always that imaginative, but if so, then present them.
>
>
> Let's just make the list so far, to get everyone up to speed with the
> discussion:
> * easier visual parsing (taste, yes, but practical also if you work with
> lots of data sets from different communities)
> * embedding semantic meaning (taste)
> * clearly isolating the context (namespace, hierarchy)
>

I'm having trouble seeing how adding math symbols, etc will help these --
they can be done pretty well with underscores...


> * matching attribute names that come from the source data
> * consistency with netCDF usage/files -> easier onboarding of those files
>

mixed bag here -- CF is intended to be more restricted than netcdf....


* Unicode/internationalization support

orthogonal question, I think. unless there's a language that uses "+" as a
letter....

I think we've only heard from me and Steve saying we didn't like this
proposal -- don't take our work on it!


-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20140115/54bca551/attachment-0001.html>
Received on Wed Jan 15 2014 - 21:07:56 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒