⇐ ⇒

[CF-metadata] detaching the standard name table

From: Bryan Lawrence <b.n.lawrence>
Date: Fri, 25 Jun 2004 14:32:03 +0100

Hi Folks

Sorry that it has taken me so long to enter into this ... I've got a lot on
this week ... but I felt I had to say something now :-). Apologies for the
length of this email, but I dont have time to write anything short. I guess
those of you who care will read on ... :-)

There are a number of threads bumping around here:

- splitting CF from the standard nametable
and dealing with who is responsible for the name table(s)
- deciding what we mean by standard names
- dealing with evolving the nametable(s)

In part these all these are driven by the recognition that CF is becoming a
defacto standard for more than just the climate community that spawned it,
and my comments should be seen in that light.

Firstly, splitting CF from the standard name table: I think this is crucial
for many reasons:
 - we need to version CF name table versions far more frequently than the rest
of CF.
 - we need to establish a governance framework for the standard names that is
different from the rest of CF.

I think the latter is important, because, without belittling the huge efforts
of the existing CF authors, if the standard names are to become more widely
used than amongst the climate modelling community there has to be a visible
mechanism by which those wider communities can influence (even control) the
name evolution.

I've argued before that a mechanism to achieve that would be so spawn a new
mailing list (CF_NAMES?, to which anyone interested could subscribe (possibly
a start on the forum Roy has suggested we need?). New name proposals could be
discussed there and possibly voted on. We could nominate some individuals to
have a veto to ensure that democracy didn't overturn logic. Obvious
individuals for that "security council" for CF names could include the
existing CF authors if they so desired, but there might be other folk with
the time and expertise to be involved (Roy Lowry springs to mind).

Secondly: what does the word "standard" mean in the phrase standard names? I
think it means that it's something that we hope has precision and longevity,
but I also note that different communities adopt different standards for good
reasons, and no community is really in a position to adopt the moral high
ground and claim precedence. "Our" standard name table had better recognise
that it will live in a world of multiple namespaces, and support the concept,
even if we want to encourage folk to use our namespace as much as possible.

Thirdly: How do we deal with evolution of the namespace, and how do we allow
communities to evolve their existing namespaces into ours? Clearly we need to
avoid proliferation (I agree with everything said on this topic already) but
there is no way I think the existing CF community should be ruling on (for
example) a chemistry name space ... but nonetheless we already have to deal
with a limited (and growing) subset of relevant chemistry in atmospheric
chemistry models.

I think the xml and xml-schema based suggestions for CF 1.X are very helpful,
I do not think we should be trying to fit everything into one flat namespace.
By analogy: we do need a dictionary interface, but we need compound words and
we need thesauri to build the complexity of names we need if we expand out of
the o(100) variables one needs in a climate model.

So, to Jonathan's specific points:

> (0) We retain the existing arrangement for adding standard name as proposed
> to the email list, which should be adequate in many cases involving only a
> few new standard names.

I think we should have a new mailing list, dedicated to naming issues.

> (1) Projects wanting to develop their own list of standard names should
> produce xml tables with the same schema as the central table (containing
> standard name, canonical units, description).

I agree we should establish a standard syntax, whether or not this is the
right one is not my call.

> (2) Each such project should have a unique name, and the project tables
> should be linked with those names from the CF home page (like conventions
> of netCDF). That means the project tables will all be public.

We need a registry of standard name tables as suggested by Pieter Haaring,
probably initially at NCAR (or we would do it if they didn't want to) but
eventually hosted by an appropriate international body.

> (3) Project tables should avoid including new names for quantities which
> already have standard names in the central tables or in any of the existing
> project tables.

Projects should try very hard to identify synonyms in other tables, but I
think it will be an SEP (Somebody Elses Problem) to work on ontological
mappings between these tables. Clearly we need to start work on ensuring the
terms we use in our descriptions are themselves from a controlled vocabulary.
(this would have helped us a lot with our discussions over omega :-)

> (4) Names in the project table which are not in the central table should
> have a prefix of the name of the project which defined them. One possible
> syntax would be, for instance, "my_project/my_standard_name". There would
> be no need for an attribute in the netCDF file to identify the standard
> name table, as the project name will refer to a table which is linked from
> the CF website. That makes the files more "future-proof" (since URLs often
> change).

If we use namespaces correctly, we should be able to avoid the URL problem, we
could even, if we wanted, use some of the PURL or DOI technology from the
digital library community. Actually, I think we should appeal for help from
that community, because this is not something where we should reinvent
wheels.

There are some other namespaces which ought to take precedence - for example,
geographical names and their meanings should come straight from an ISO
standard (someone remind me which one :-) ... and we shouldn't make decisions
about what parts of someone else's namespace we want to include ... for
example, I understand the reasons why country names were not included in CF
included problems of discontinuity and permanence, but to be inclusive we
have to include such things (and can deal with permanace with clear
versioning).

Well, that's probably enough for now ... congrats to anyone who has read this
far ...

Bryan

-- 
Bryan Lawrence,        Head NCAS/British Atmospheric Data Centre
Web: badc.nerc.ac.uk                      Phone: +44 1235 445012
CCLRC: Rutherford Appleton Laboratory, Chilton, Didcot, OX11 0QX
Received on Fri Jun 25 2004 - 07:32:03 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒