⇐ ⇒

[CF-metadata] CF #24 (common_concept proposal)

From: Pamment, JA <J.A.Pamment>
Date: Wed, 24 Sep 2008 14:06:09 +0100

Dear All

I volunteered to act as the moderator for trac ticket 24, the proposal
for the common_concept. This ticket has attracted a lot of interest and
I added my summary of the current status of the discussion to the trac
system on 13th August. Unfortunately, due to an intermittent problem
with email, my changes were not sent to the cf-conventions mailing list
and as a result many people are probably unaware that the summary has
been added. For convenience, I have included the text of the summary
below. If you wish to make any further comments in the discussion,
please do not reply to this posting but add them to the trac ticket in
the usual way.

Best wishes,
Alison

==> Please note new email address: alison.pamment at stfc.ac.uk <==

--- Reproduced from trac ticket #24 ---

As moderator of this ticket I would first like to thank everyone for the
many excellent and thoughtful contributions to the discussion. Clearly
this proposal has excited a lot of interest. Thanks in particular to to
Bryan, John Graybeal and Steve for summarizing at intervals and
providing suggestions on how to proceed with the discussion.

I would like first to express my own opinion before moving on to my
summary of the ticket. From my point of view, an important strength of
this proposal is that it could help to speed up the agreement of
standard names. I'm not really convinced that it will reduce the number
of standard name proposals because a common_concept will still require a
standard_name to point to when describing a physical quantity. However,
if a scientific community is able to make use of its own familiar terms
via the common_concept I think that will tend to reduce the tension
between the community's wish/requirement to have their particular terms
accepted as standard and the need for standard_names to be constructed
according to CF's accepted rules and guidelines. That is my reason for
supporting this proposal.

I will turn now to the role of moderator and try to give as unbiased a
summary as possible of the discussion so far. I have tried to
concentrate on the main areas of agreement and disagreement while
attempting not to restate too much of the detailed technical arguments.

1. There is agreement that common_concepts may be used to:-
a) draw together a combination of CF attributes, to include standard
name and other attributes such as cell_methods, valid values and/or
valid ranges of coordinate variables and names of grid mappings. (This
list is indicative and not necessarily exhaustive);
b) provide synonyms for the standard name attribute alone.

2. All but one contributor seem to be agreed that a CF "concept
registry" should be established within which user communities can
register their own common_concepts. Steve Hankin has raised the concern
that registering URIs could lead to namespace collisions.

3. There is agreement on the following potential benefit:-
The introduction of common_concepts will allow mappings to be created
between CF and an unlimited number of other vocabularies, whether they
be local to a single institution or widely adopted in other metadata
standards.
This will address the following use cases:-
a) the requirement for data centres to serve files that contain a user's
own familiar vocabulary;
b) the requirement for short names as an alternative to standard_names +
additional metadata;
(N.B. this should not be taken to imply that all are agreed on _how_ the
mapping should be achieved in practice - see later)

4. Other potential benefits and topics that have emerged during the
discussion:-
a) The possibility of registering a concept even while some of its
constituent parts are under discussion but have not been fully agreed
within CF, for example, while the standard name is still under
discussion. This would require some form of registration system that
would cater for newer versions of common_concepts. This would require a
proposal to be put forward under another trac ticket.
b) Common_concept may provide a mechanism for CF metadata to reference
terms from other vocabularies/metadata standards. In particular, this
would require the development of a syntax or "parsing scheme" for
referencing metadata outside the CF standard. Benno has opened ticket
#27 for the discussion of this topic. The form of the URIs used for
external referencing could also be adopted for referencing
common_concept bundles of CF attributes.
c) Bryan has raised the question of how the local names that map onto
the common_concept URIs should be encoded. This is an important
subtopic for the discussion of the current ticket and should be opened
under another trac ticket.
d) John Caron, in discussing the content of the common_concept attribute
raised the question of mapping not just identities, but other types of
relationships between common concepts and local names. I think this
point is very much akin to Steve Hankin's identification of the need to
"add semantic richness" to standard names, for example, describe the
relationship between sea_surface_temperature and
sea_surface_skin_temperature.


It should be noted that none of these was part of the original proposal
and in order to keep this discussion to manageable proportions these
topics should be pursued under separate trac tickets or mailing list
discussion threads as appropriate.

5. This ticket should continue to be used for the discussion of the
original proposal which concentrates on defining common_concepts based
on metadata attributes that already form part of the CF conventions.
Even this narrower view gives rise to a number of questions that need to
be addressed before a second draft of the proposal can be prepared.

a) What language should be used to describe the definition?
It is clear that, if a common_concept is to be defined, there must
necessarily be a way of describing that concept. The original proposal
attempts to use CDL as a language for describing the constraints on a
common concept, i.e., what values its constituent attributes may be
permitted to take. It has been demonstrated during the discussion of
this ticket that CDL does not currently lend itself to this use. Some
of the examples in the original proposal are ambiguous in their
interpretation. Ticket 29 has been opened for the discussion of the
technical aspects of using CDL to describe the common_concept. Other
possible methods of describing a common_concept have been raised during
the discussion of this ticket. These include OWL/XML, RDF and OCL. The
resolution of this issue is clearly essential if a second draft of the
common_concept proposal is to be developed and I would encourage all
interested parties to contribute to the discussion under ticket 29.

b) What will be the procedure for registering a common concept
definition?
There is some question as to whether the registration of a
common_concept should be an entirely automated process or whether it
would/should require some manual intervention. I would note that,
although the actual procedure for registering a common_concept may not
need to be included in the CF conventions (in the same way that the
procedure for proposing a new standard name is not spelt out within the
document itself), we nevertheless need to agree the procedure and put
any necessary software/services in place before the common_concept can
be put to practical use. I think that a second draft of the proposal
would need to contain some further clarification of the registration
process.

c) What CF metadata attributes are allowed to be included in the
common_concept definition?
This hasn't really come out in the discussion, but I think that for the
purpose of clearly defining the common_concept within the CF conventions
it may be necessary to list those attributes that can be used as part of
a combination. Does it make sense to allow the use of any attribute or
are there some that should be excluded from use in common_concepts?

6) There needs to be a means of attaching the information represented by
a common_concept to a CF variable. The original proposal is that this
will be achieved by introducing a single additional attribute, called
common_concept, whose value will take the form
namespace:scoped_name;URI. The value that this attribute should take,
and indeed whether a single attribute can suffice, has proved to be the
point of greatest debate within this discussion.

To recap, the intended purpose of each of the proposed components of the
common_concept attribute is as follows:
scoped_name - this is the name that is used within a scientific
community to refer to a data variable such as 2 metre temperature;
namespace - in essence, identifies the community or institution that
registered the scoped name;
URI - a machine readable identifier for the registered definition of the
common_concept. All namespace:scoped_name identifiers that reference
the same common_concept would be associated with the same unique and
unchanging URI.

The proposed design incorporates two distinct elements:
i) The local name that is familiar to the scientist using the data
ii) The means of mapping that local name to the registered
common_concept and, by implication, to other synonymous local names.

a) The use of URIs.
We are agreed that any URIs should be opaque, i.e., contain no semantic
information additional to the CF attributes of the common_concept to
which the URI points. There is no clear preference for the use of URNs
or URLs - John Graybeal has suggested that both might be included.
Ticket #27 includes much discussion of the form that URIs should take
and the outcome of that ticket should be used to inform the second draft
of the common_concept proposal.

b) The local name vs the URI.
The original proposal was to include both these elements in the
common_concept attribute. However, a number of contributors have
questioned this point:
John Caron suggested including the local name only - he was concerned
that a fixed URI would unduly limit possible mappings between
vocabularies (see below);
Steve Hankin also suggests using the local name only and does not
support the registration of URIs within CF;
Jonathan has suggested including the common_concept URI only - he has
argued that external software should translate the URI to a local name;
On behalf of the proposers Frank and Bryan have continued to argue that
both elements should be included so that the URI can point to the bundle
of attributes forming the concept while the inclusion of local names is
convenient for scientists accessing data from within a particular
institution.

Frank has further suggested that the two elements could be split between
two new attributes - common_concept_urn and common_concept_local.

The decision as to whether to include one or both elements of the name
depends very much on how the mapping process between URI and local name
(and by implication between one local name and another) is to be
achieved (see point (c) below).

c) Should the mapping process take place within CF supported processes
or by an external mechanism?
According to the original proposal the mapping would be achieved by
registering both a common_concept metadata bundle and a
namespace:scoped_name with CF. This, coupled with the proposed
automated registration procedure, would require CF to be responsible for
maintaining a machinable list of both these elements. The case for CF
providing this service from a central server has been further argued by
Frank and Bryan.

John Graybeal, while supporting the establishment of a CF common_concept
registry, asked whether it would be appropriate for the mapping
mechanism from common_concept to local name to be entrenched within the
CF process. He pointed out that the method of mapping from a CF
registered scoped name to another is a solution that is very local to
the CF community and asked whether the mapping from one vocabulary to
another should be done in cooperation with other organisations.

John Caron asks whether mapping via an immutable URI is the best
approach because it allows only for the identity mapping between
common_concept/standard_name and another vocabulary. The current
proposal does not address the construction of more complex relationships
such as finding broader or narrower terms than a particular scoped name.
However, the important point for this discussion is that John also
proposes splitting the job of naming the common_concept from that of
mapping and that the attribute should not attempt to encapsulate the
mapping mechanism.

Jonathan suggests that the mapping between the URI and the local name
should be performed by servers within each institution. He prefers the
suggestion of giving the URI and the local name in separate attributes,
but does not support registering the local names in the CF standard.

As mentioned in my point 2, Steve Hankin has expressed the view that CF
should not act as a registry even for the URIs because of the potential
for namespace collision. Individual institutions/ data centres would
then be responsible for mapping their own names to the CF attributes.

7. Conclusion
The most important point to draw out is that we have a unanimous
consensus that the common_concept, as a means of bundling together a
number of attributes or as a synonym for standard names, will be a
useful addition to the CF conventions. We must therefore work to
resolve the outstanding issues that have been raised during the
discussion.
 
Progress now rests on reaching a decision on whether CF should act as a
registry for the common_concept attribute bundles (and presumably
associate them with a URI), local scoped names, or both. I would say
that we are very close to achieving consensus that CF should register
the common_concept bundles but we are rather further away from consensus
on whether to register the local names. Making a decision on this point
will also clarify what the content of the common_concept attribute
should be.

In any case, I think it will not be possible to finalise all the details
of a second draft proposal until the outcomes of ticket 29 (CDL as a
constraint language) and 27 (on namespace tags) are decided. However, I
hope that this summary will provide a starting point for developing a
second draft. The second draft should make clear:
a) its scope (i.e., bundling together attributes that already form part
an agreed of the CF conventions);
b) a common_concept can consist of a standard_name only;
c) the registration process for the common concept.

Best wishes

Alison
---
------
J Alison Pamment                        Tel: +44 1235 778065
NCAS/British Atmospheric Data Centre    Fax: +44 1235 446314
Rutherford Appleton Laboratory          Email: alison.pamment at stfc.ac.uk
Chilton, Didcot, OX11 0QX, U.K.
Received on Wed Sep 24 2008 - 07:06:09 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒