[CF-metadata] non-standard standard_names from Steve Hankin on 2010-05-12 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Steve Hankin <Steven.C.Hankin>
Date: Wed, 12 May 2010 11:22:12 -0700

Hi Martin,

You've had two enthusiastic "yes" responses, so I guess I have the
privilege to be the wet blanket. So it goes. I will give only a very
cautious and limited "yes". Not an outright "no" ... but a suggestion
for more thought and discussion.

The proposal here is effectively the creation of 'private tables' as a
means of achieving extensibility. We've had an opportunity to see the
hazards embedded in this approach as a long-term evolutionary process in
WMO. Over time the "custom" tables evolve to have an quasi-official
status -- entire sub-communities rely upon them -- but without
necessarily a corresponding methodical control over their creation and
distribution. With BUFR and GRIB files the proliferation of distinct
tables has lead to serious interoperability problems.

To avoid repeating these problems with your proposal, CF clients must be
provided with *iron-clad ways to be assured that they are referring to
the same vocabulary tables that the data author was referring to at the
time that the data were written*. Since we want CF files to ensure
interoperability when there are *years separating the writing of data
from reading it*, your strategy needs to ensure careful version control
over the private tables. This imposes a significant burden on you as
the creator of a "<project>_standard_name" table -- essentially a
requirement to retain and serve out older table versions "in perpetuity"
(we could argue over what that means). The use of semantic web
technologies will not alter these considerations for the foreseeable
future (tho over the long term sophisticated inference engines might
...). The ontologies still need to be informed by correct information,
which implies knowledge of the version-controlled private vocabularies.

A "<project>_standard_name" may have one of three life histories: it
may never become accepted into the standard_name table; it may be
accepted as-is; or it may be accepted with alterations. The following
suggested restriction illustrates some of the difficulties: "A variable
can contain either a standard_name or <project>_standard_name attribute
but not both." What's behind this restriction? Given the uncertain
life history of a <project>_standard_name, if it has been in use for
(say) a year and is found in thousands of files that are being shared
around the community, doesn't that generate a need to continue support
for it.

Two alternative approaches (both flawed, of course ... the nature of the
beast):

   1. Should the CF standard_name process, itself, include a
      "provisional fast-track", that allows names to be added very
      quickly with no guarantee that they will have a lasting status,
      but with an *iron-clad guarantee that the provisional names will
      be retained* (and so-identified) in version-stamped (older) CF
      vocabularies.
      or
   2. Might you be better off using a *truly private* vocabulary of
      "<project>_standard_name" strings. I.e. one that has no official
      status in CF at all? There is no violation to the CF standard
      through doing this. This approach makes it your private
      responsibility on behalf of your users to deal with files that are
      created in the period between proposing a CF standard_name and
      having it become part of the official table

    - Steve

====================

Schultz, Martin wrote:
> Dear all,
>
> we are currently cleaning all files on our TFHTAP multi-model
> experiment server to make them fully CF(1.0) conformant. It has been
> about 3 years since we had drafted the original format description of
> these experiments and also initiated the standard name discussion for
> chemical constituents (thanks again to Christiane Textor who did a lot
> of this initial work). Many standard names which we needed have now been
> defined (thanks to all who contributed and to Allison for maintaining
> the list!). Nevertheless, there are a number of model variables left for
> which no standard name has been agreed upon and where we (or the CF
> mailing list group) also felt that they are too specialized to deserve a
> "standard" name. From the perspective of the CF community this may not
> be an issue, but in the context of interoperability (we now operate a
> WCS server to share these files) the fact that some variables do have a
> standard_name attribute and others don't poses considerable challenges.
> The CF convention states that "either standard_name or long_name" should
> be present. In our view, the long_name attribute is a poor substitute
> for the standard_name, because it has no rules attached. We are now
> planning to substitute "illegal" standard_name attributes by a new
> "htap-_standard_name" attribute, which shall make clear that these names
> are derived according to the CF guidelines, but they are not accepted
> standard_names. Such a concept would enable software tools to easily
> scan additional standard_name tables and make use of the well-defined
> semantics that a standard_name provides without having to push
> additional standard_names through the discussion - in particular if they
> are no so "standard". I can see the danger that certain groups might
> think it no longer necessary to go through the tedious but ultimately
> worthwhile discussion process in this mailing list and the meaning of
> "standard" names could get diluted. However, in my view the advantage of
> having the possibility to extend the convention without breaking
> standard-conformance outweighs this potential disadvantage.
>
> Specifically I would thus propose to add an optional attribute to
> the CF documents such as:
>
> <project>_standard_name: use this attribute to define the meaning of
> variables which have no accepted standard_name defined (yet). The
> project name should be a single string without blanks or underscore
> characters. These project-specific standard_names must follow the
> guidelines for the construction of standard_names, but they will not be
> evaluated by generic tools which test a data file for CF compliance.
> Groups who wish to define such project-specific standard names should
> first consider to submit their proposals to the CF mailing list for
> inclusion in the CF standard name table. A variable can contain either a
> standard_name or <project>_standard_name attribute but not both. A
> long_name attribute is not needed when a <project>_standard_name is
> given.
>
>
> Best regards,
>
> Martin
>
>
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20100512/f38a4cfd/attachment-0002.html>
Received on Wed May 12 2010 - 12:22:12 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST