⇐ ⇒

[CF-metadata] CF upgrade to netCDF variable names

From: <jbiard>
Date: Wed, 15 Jan 2014 10:39:23 -0500

 

Hi.

I don't think we should use ease of mapping variable names to
a programming language as a reason for allowing (or not allowing) any
particular character in variable names. CF has, as I understood it,
considered variable names as completely up to the producer, relying on
attributes to provide meaning. So, I can name a temperature variable
"fluffy_bunny" if I want to, and it is completely valid.

Section 1.3
of the Conventions states, "No variable or dimension names are
standardized by this convention."

Section 2.3 states:

Variable,
dimension and attribute names should begin with a letter and be composed
of letters, digits, and underscores. Note that this is in conformance
with the COARDS conventions, but is more restrictive than the netCDF
interface which allows use of the hyphen character. The netCDF interface
also allows leading underscores in names, but the NUG states that this
is reserved for system use.

Case is significant in netCDF names, but
it is recommended that names should not be distinguished purely by case,
i.e., if case is disregarded, no two names should be the same. It is
also recommended that names should be obviously meaningful, if possible,
as this renders the file more effectively self-describing.

This
convention does not standardize any variable or dimension names.

While
the Conventions makes recommendations about variable names, NO STANDARDS
are set by the Conventions.

So, why were non-alphanumeric characters
other than '_' excluded by practice back in the day? Are these reasons
still valid? In fact, given the statements in the Conventions, is there
actually anything other than opinion constraining people from using any
characters they like in variable (and dimension) names (as long as they
are OK with netCDF and maybe NUG)?

Grace and peace,

Jim

On
2014-01-14 12:08, Chris Barker wrote:

> There is another reason:
>
mapping CF variable names directly to programming language variable
names is pretty handy -- so it's nice if those are legal.
> I'm sure
not all programming languages have the same restrictions on names, but
there is surely a subset that's pretty common (i.e. none of the usual
math characters).
> -Chris
>
> On Mon, Jan 13, 2014 at 12:57 PM,
Steve Hankin <steven.c.hankin at noaa.gov [9]> wrote:
>
>> Hi John,
>>
>>
Philosophically I am aligned with Bryan: the purpose of the CF standard
is to constrain (simplify and make predictable) the use of a highly
general file creation toolkit like netCDF. The question of limitations
placed on name strings should be evaluated on this yard stick.
>>
>>
There is a class of problems that are created by embedding special
syntax characters willy-nilly into name strings. Namely, that the use of
such characters can render mathematical expressions ambiguous. Here's a
simple example. Suppose a file contains 3 surface marine variables --
lets say atmospheric CO2, ocean CO2 and an artfully computed delta
across the surface. Further say that the file creator chooses to name
the delta variable using a "-", as in
>> atmosCO2
>> waterCO2
>> and
>>
_ _ atmosCO2-waterCO2
>>
>> Then the meaning of the mathematical
expression "atmosCO2-waterCO2" has been rendered ambiguous. Is it a
single variable name, or the difference of two? One is forced to use
arbitrary tricks that are alien to the scientific users we are trying to
serve -- say disambiguating the expression by insisting on surrounding
quotes, "atmosCO2"-"waterCO2", white space, "atmosCO2 - waterCO2".
(Would any scientist read "atmosCO2 - waterCO2" and "atmosCO2-waterCO2"
to have distinct meanings?)
>>
>> As you say we have already headed
down this (slippery) slope. Characters like "+", "-", "." and
case-sensitivity have leaked through into fairly common practice. For
better or worse. :-( (Should the publishers of science textbooks start
using case-sensitive variable names?) So the question that you've posed
is in a sense, _now that the horse is out of the barn, is there any
merit to keeping the other animals penned?_ Like Brian, I would argue
that the way to answer this is to insist that at least there be
significant gains from letting them out.
>>
>> Another unintended
negative consequence: the impact on free text searches when our variable
names include special syntax characters. Are our metadata procedures on
an arc so promising that we have no need to rely on general Google-style
tools for discovery?
>>
>> - Steve
>>
>>
=============================================
>>
>> On 1/13/2014 12:12
PM, John Graybeal wrote:
>>
>>> Not sure I am following you --
constraints are presumably there for a reason, I wasn't sure what the
reason was for these particular constraints, but thought they might have
simply echoed earlier netCDF constraints.
>>> To your 'use case'
question, we were thinking about alternatives to mx_ as prefix for our
own attributes, to minimize the chance of collisions (e.g., with some
maintenance variables someone might name mx_).
>>> john
>>>
>>> On
Jan 13, 2014, at 11:27, Bryan Lawrence <bryan.lawrence at ncas.ac.uk [5]>
wrote:
>>>
>>>> Hi John
>>>> In the spirit of CF being *constrained*
netCDF, it seems that we wouldn't, unless we had a specific use case ...
do you?
>>>>
>>>> Cheers
>>>> Bryan
>>>>
>>>> On 13 January 2014
18:54, <john.graybeal at marinexplore.com [3]> wrote:
>>>>
>>>>> As netCDF
is growing to allow _at_, +, hyphen, and period in
variable/dimension/attribute names, is there any likelihood CF will grow
to allow some or all of those characters?
>>>>>
>>>>> I seem to recall
some tools have conflicts with some of those characters (aside from them
being non-conformant). But consistency and flexibility would be
nice.
>>>>>
>>>>> john
>>>>> ------------------------------------
>>>>>
John Graybeal
>>>>> Sr. Data Manager, Metadata & Semantics
>>>>>
>>>>>
M +1 408 675-5445
>>>>> skype: graybealski
>>>>> Marinexplore
>>>>> 920
Stewart Drive
>>>>> Sunnyvale 94085
>>>>> California, USA
>>>>>
www.marinexplore.com [1]<http://marinexplore.com [2]>
>>>>>
>>>>>
--
>>>>> Scanned by iCritical.
>>>> 
>>>> -- 
>>>> 
>>>> Bryan Lawrence
>>>> University of Reading: Professor of Weather and Climate Computing.
>>>> National Centre for Atmospheric Science: Director of Models and
Data. 
>>>> STFC: Director of the Centre for Environmental Data
Archival. 
>>>> Ph: +44 118 3786507 or 1235 445012;
Web:home.badc.rl.ac.uk/lawrence [4]
>>> 
>>>
------------------------------------
>>> JOHN GRAYBEAL
>>> Sr. Data
Manager, Metadata & Semantics
>>> 
>>> M +1 408 675-5445 
>>> skype:
graybealski
>>> Marinexplore
>>> 920 Stewart Drive
>>> Sunnyvale
94085
>>> California, USA
>>> www.marinexplore.com [6] 
>>> 
>>>
_______________________________________________
>>> CF-metadata mailing
list
>>> CF-metadata at cgd.ucar.edu
>>>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>> 
>>
_______________________________________________
>> CF-metadata mailing
list
>> CF-metadata at cgd.ucar.edu [7]
>>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata [8]
> 
> -- 
>
> Christopher Barker, Ph.D.
> Oceanographer
> 
> Emergency Response
Division
> NOAA/NOS/OR&R (206) 526-6959 voice
> 7600 Sand Point Way NE
(206) 526-6329 fax
> Seattle, WA 98115 (206) 526-6317 main reception
>
> Chris.Barker at noaa.gov [10]
 
Links:
------
[1]
http://www.marinexplore.com/
[2] http://marinexplore.com/
[3]
mailto:john.graybeal at marinexplore.com
[4]
http://home.badc.rl.ac.uk/lawrence
[5]
mailto:bryan.lawrence at ncas.ac.uk
[6] http://marinexplore.com
[7]
mailto:CF-metadata at cgd.ucar.edu
[8]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
[9]
mailto:steven.c.hankin at noaa.gov
[10] mailto:Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20140115/a2e33727/attachment-0001.html>
Received on Wed Jan 15 2014 - 08:39:23 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒