⇐ ⇒

[CF-metadata] a different (but perhaps unoriginal) approach to standard name construction

From: John Graybeal <graybeal>
Date: Tue, 4 Nov 2008 08:55:42 -0800

I love the list of classifiers and hope that discussion can continue.
Having also tried to come up with a pervasive system for standard
names (both in CF and in other contexts) over the years, here are some
observations.

Naming Effort: It appears CF standard names were originally Much More
about coming up with the right name, and partially partitioning useful
characteristics, than about a precise definition. This reflects the
original community needs, i think; as community needs for precision
have grown, so has attention to the definition. But Jonathan is spot-
on: getting a name that reflects both the meaning AND community usage
has been the challenge. While it frustrates name proposers, it
provides great comfort to users.

Normalization and uniqueness: If I understand the proposal correctly,
it calls for tracking all the orthogonal classifiers as possible
components of the standard name. ('These independent bits of
information could be automatically assembled together to create the
"standard name".') Is this any different from a database key
construction from multiple independent columns of data? Each unique
combination of the n components makes another possible name, and the
meaning is encoded into the name itself. Exclusion of a component from
the name means all values are accepted in that axis.

Length and Complexity: It will be a Very Long standard name in many
cases. No technical limitations, probably, but social reaction to
these long names will be poor at best. (And will depend on some
particularly clever way to indicate omitted categories when
constructing the name.) Of course, more common cases will usually be
shorter, but people won't always put in the relevant categories, or
won't realize they are relevant. ("Oh, c'mon, everyone knows that
*has* to be over water.") Like filling out metadata, detail will be
avoided during name creation, for better and for worse.

Unique Identifiers for Resources: I agree with Benno: CF absolutely
should have a separate resource identifier on the web for (a) all the
existing and historical standard names, and (b) any name you come up
in this system. (I am separately engaged in creating and serving
identifiers for vocabulary terms, so of course I would feel that way.
We just now have a service that can provide this; I just started
pursuing its application for/with CF.) As an aside, this proposal may
be a case where using opaque codes as the identifier, and the standard
name as a label string, offers improved value to users.

Unique Identifiers for Data Set Variable: This was proposed as a
solution "to identify with a single standard name, closely related
variables that one might want to store in a single array". I
discourage using standard names as "the unique names for a data set",
because there will always be a category for differentiating variables
that isn't available in the standard convention. (primary vs secondary
instrument, first/second/third installed sensor, clean/dirty, and on
and on). Standard names should be used to describe each variable, not
name it.

Defining Similarity: For a variable mapping exercise, we considered
what makes one thing the 'same as' something else. The answer is (of
course) 'it depends'. The great advantage of this proposed approach is
that it 'normalizes' the distinctions into the separate categories, so
the user can evaluate the match much more directly for his or her own
needs. But be aware that it will move the discussions of similarity
and difference into the next layer of semantic detail ("does 'body of
water' include underground streams?" and so on).

Central Catalog: If the rules are deterministic, and every category
has a controlled vocabulary, you don't need a single list of what
names (i..e, combinations of categories) are approved; any possible
combination of category terms is legal, right? This is fortunate, as
the number of proposed names may indeed grow very large very quickly,
and people will often just construct the names without bothering to
submit them. You also don't need definitions; the definition is the
compilation of all the displayed components in that name. (If it
*isn't* the same as the aggregation, then there is by definition
another axis of interest that needs to be turned into a category, or
you will have 2 standard names that look the same but have different
meanings.) So this is really a system for creating a single-label
categorization scheme across multiple axes; no catalog is strictly
needed for the naming convention to work.

Semantics and Ontologies: WIth this proposal, we are much further into
creating classification systems for all concepts relevant to CF names
(as opposed to conceptually linking the existing CF concepts, which is
slightly different). I think this is inevitably a direction to be
taken by someone -- witness the Plasmo work -- but it turns the
process into something very much like other knowledge classification
efforts in the semantic community. That isn't a pro or a con, just an
observation. There are lessons to be learned and tools to be reused
from work that has gone before. In that regard, I would love to be
informed of existing vocabularies (formal or informal) that exist for
each of these categories, particularly the first two. (Can we start a
wiki page for this info somewhere?)

In summary, I love this idea in principle, but think we can expect a
stately progression toward seeing it in action. It serves a different
need and audience than Standard Names, and so perhaps should be
considered and developed separately, not necessarily as a replacement
for them.

John

--------------
John Graybeal <mailto:graybeal at mbari.org> -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org
Received on Tue Nov 04 2008 - 09:55:42 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒