⇐ ⇒

[CF-metadata] Usage of the 'Conventions' attribute

From: Nan Galbraith <ngalbraith>
Date: Wed, 30 Jan 2013 15:41:46 -0500

On 1/29/13 9:19 AM, Bentley, Philip wrote:
> Hi Nan,
>> I agree with Roy that CF should be the default namespace in a
>> CF compliant file...
> What does that mean?

It means that the definitions of attributes and variable names in a
CF file should first be assumed to be from CF, and for those that are
not, should be found in the other conventions specified in the file.
For my files, this means the conventions listed in the global attribute
Conventions, in order of their appearance there, with CF being first.

It is in response to a suggestion on this thread that CF attributes
be identified "using a 'cf_' prefix" - which I think is unnecessary.

People who develop metadata extensions that overlay CF should plan
ahead to 'protect' terms they define against the possibility that the
term will be used (differently) by CF in some future rev. They can do
that by adding namespace information to their terms, e.g. sdn_some_term,
but CF shouldn't need to do that; so in that sense it can be considered
the default namespace.

For a metadata extension like OceanSITES (OTS), we need to keep an eye
on CF developments with regard to the extra attributes we've defined.
In the unlikely event that at some point one of these is added to CF, and
the definition is not the same as the one we have in OTS, we may have a
problem. We can either stick with the previous version of CF, or revise
our term (possibly by prefixing it with a namespace identifier like
'OTS_').

We're in the process of doing something like this now, to achieve
compliance with the NetCDF Attribute Convention for Dataset Discovery
(NACDD, aka UDD) - NACDD has some terms with the same definition as
OTS terms, so we're moving towards adopting those terms instead. The
OTS version number in the Conventions attribute will be key for software
that needs to understand details of these files (although these could
be considered second-tier attributes - they don't affect understanding
the data in any way).

> Until we agree on the meaning and mechanism of namespace-handling within
> CF-compliant files and software, I fear that such statements do not
> provide much guidance at the practical, implementation level.
>
> For example, in a CF-compliant netcdf file it's possible to have named
> identifiers from the netCDF ontology (e.g. _FillValue), from the COARDS
> ontology (e.g. missing_value), from the CF ontology (e.g.
> standard_name), and from any number of additional ontologies that one
> happens to reference (SDN, ISOxxx, Dublin Core, etc, etc). I don't
> believe it makes sense to bundle them all, by default, into some
> notional CF namespace.
>
Do you see a need to identify the source of a term, other than to prevent
'collisions', where a term has different definitions in different standards
that are employed within a file? Is there a practical use for explicitly
identifying the 'standard_name' attribute as being part of CF, rather than
NetCDF? How would you identify 'title' which is part of NetCDF, CF, *and*
NACDD?

You've listed 3 types of standards here; CF is built on NetCDF and COARDS;
it's my understanding that CF by definition includes the terms (and their
definitions) of the others. SDN (and NACDD, which we use in OTS) are
already CF-aware. DC and ISO, being completely external to CF, could
be seen as potential problems; but DC is meant to work with existing
vocabularies, and I just don't see a potential 'collision' there - am I
missing something?

> Perhaps the only solution lies in using netCDF-4's existing namespacing
> mechanism, namely groups (which idea has been proposed before). I
> suspect this too will have it's adherents and detractors!
>> ... and that this problem belongs to groups that are writing
> extensions.
>
> Do you mean software extensions, or metadata extensions (profiles)?
Extensions to CF, so metadata
> Also, if you hand off responsibility to other groups (whatever and
> wherever they may be) would we not end up with a myriad of incompatible
> solutions?
>> Should more of these community conventions be added to CF?
>> I'm sure there are SDN and NACDD (data discovery) attributes
>> that would be helpful to some CF users; it would be awfully
>> nice to have a list of already-defined attributes - in one
>> place - to choose from when putting together a CF-based spec
>> for a project.
> This would seem to be a good candidate, IMO:
>
> http://www.unidata.ucar.edu/software/netcdf/conventions.html
I was thinking that many of the terms in NACDD could be useful to
CF users, as optional global attributes. Before we began adopting
NACDD, I'd probably have included my own terms like 'start_time'
and 'file_version_date', where now we use 'time_coverage_start' and
'date_created'. Does it make sense to help standardize these terms
by including them in CF, or should these be handled by different
domains, like data discovery?

My own preference is to bundle them into CF, in part because it
ensures well-understood definitions, and a place to discuss them,
which are not always found in other conventions/communities.

A second choice would be having a resource where terms from
the 'major' conventions could be accessed. Roy's BODC server,
at http://seadatanet.maris2.nl/v_bodc_vocab/welcome.aspx*,
*and the MMI ontology server, at http://mmisw.org/orr/#b
both work fairly well for this.

Cheers - Nan


-- 
*******************************************************
* Nan Galbraith                        (508) 289-2444 *
* Upper Ocean Processes Group            Mail Stop 29 *
* Woods Hole Oceanographic Institution                *
* Woods Hole, MA 02543                                *
*******************************************************
Received on Wed Jan 30 2013 - 13:41:46 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒