Hi.
Attributes with a type of string are now possible with netCDF-4, and
many examples of attributes with this type are "in the wild". As an
example of how this is happening, IDL creates an attribute with this
type if you select its version of **`string`** type instead of
**`char`** type. It seems that people often assume that **`string`** is
the correct type to use because they wish to store strings, not characters.
I propose to add verbiage to the Conventions to allow attributes that
have a type of **`string`**. There are two ramifications to allowing
attributes of this type, the second of which impacts string variables as
well.
1. A **`string`** attribute can contain 1D atomic string arrays. We need
to decide whether or not we want to allow these or limit them (at least
for now) to atomic string scalars. Attributes with arrays of strings
could allow for cleaner delimiting of multiple parts than spaces or
commas do now (e.g. flag_values and flag_meanings could both be arrays),
but this would be a significant stretch for current software packages.
2. A **`string`** attribute (and a **`string`** variable) can contain
UTF-8 Unicode strings. UTF-8 uses variable-length characters, with the
standard ASCII characters as the 1-byte subset. According to the Unicode
standard, a UTF-8 string can be signaled by the presence of a special
non-printing three byte sequence known as a Byte Order Mark (BOM) at the
front of the string, although this is not required. IDL (again, for
example) writes this BOM sequence at the beginning of every attribute or
variable element of type **`string`**.
Allowing attributes containing arrays of strings may open up useful
future directions, but it will be more of a break from the past than
attributes that have only single strings. Allowing attributes (and
variables) to contain UTF-8 will free people to store non-English
content, but it might pose headaches for software written in older
languages such as C and FORTRAN.
To finalize the change to support **`string`** type attributes, we need
to decide:
1. Do we explicitly forbid string array attributes?
2. Do we place any restrictions on the content of **`string`**
attributes and (by extension) variables?
Now that I have the background out of the way, here's my proposal.
Allow **`string`** attributes. Specify that the attributes defined by
the current CF Conventions must be scalar (contain only one string).
Allow UTF-8 in attribute and variable values. Specify that the current
CF Conventions use only ASCII characters (which are a subset of UTF-8)
for all terms defined within. That is, the controlled vocabulary of CF
(standard names and extensions, cell_methods terms other than free-text
elements of comments(?), area type names, time units, etc) is composed
entirely of ASCII characters. Free-text elements (comments, long names,
flag_meanings, etc) may use any UTF-8 character.
Github issue: #141
<
https://github.com/cf-convention/cf-conventions/issues/141>
Trac ticket: #176 <
https://cf-trac.llnl.gov/trac/ticket/176#ticket>
Grace and peace,
Jim
--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA?s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: jbiard at cicsnc.org <mailto:jbiard at cicsnc.org>
o: +1 828 271 4900
/Connect with us on Facebook for climate
<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics
<https://www.facebook.com/NOAANCEIoceangeo> information, and follow us
on Twitter at _at_NOAANCEIclimate <https://twitter.com/NOAANCEIclimate> and
_at_NOAANCEIocngeo <https://twitter.com/NOAANCEIocngeo>. /
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20180723/2372af38/attachment-0001.html>
Received on Mon Jul 23 2018 - 10:29:07 BST