Opened 13 years ago
Last modified 12 years ago
#27 accepted enhancement
Use namespace tags to include other conventions in netcdf files without repeating them in CF
Reported by: | benno | Owned by: | edavis |
---|---|---|---|
Priority: | medium | Milestone: | |
Component: | cf-conventions | Version: | |
Keywords: | Cc: |
Description
Strictly speaking, this is not an enhancement to CF -- it is a way of encoding additional information in a netcdf files without extending CF. It certainly would affect discussions for extending CF, and hopefully will elicit some discussion in the CF community. In any case, creating a ticket was suggested.
My suggestions are based on an evolution of how metadata is encoded in netcdf files: rather than simply allowing multiple conventions in netcdf files by a comma-separated list in the Conventions attribute, specify prefixes so that attributes can be explicitly labelled as belonging to a different convention. In particular, were I to want to use attributes from the convention used by WCS to describe projections, I would use a conventions tag in the file like,
Conventions = "CF-1.0, wcs=http://www.opengis.net/wcs/1.1/"
means the default convention is CF-1.0 (i.e the usual usage), but any attribute starting 'wcs:' belongs to the convention defined by the namespace http://www.opengis.net/wcs/1.1/ (which is how wcs or any other XML convention standardizes its labels). This would change the WGS 84 example in ticket 18 slightly, for example, to
dimensions: lat = 18 ; // dummy values lon = 36 ; variables: double lat(lat) ; // conventional definition double lon(lon) ; // conventional definition float temp(lat, lon) ; temp:long_name = "temperature" ; temp:units = "K" ; temp:grid_mapping = "crs" ; temp:wcs\:gridCRS = "crs" ; int crs ; crs:grid_mapping_name = "lat_long_wgs1984" ; crs:wcs\:GridBaseCRS = "urn:ogc:def:crs:EPSG:6.0:4326" ; // Use EPSG ID 4979 for 3D CRS. ID 4326 refers to 2D CRS. crs:crs_name = "WGS 84" ; crs:geodetic_datum_name = "World Geodetic System 1984" ; crs:longitude_of_prime_meridian = 0.0 ; crs:ellipsoid_name = "WGS 84" ; crs:semi_major_axis = 6378137.0 ; crs:inverse_flattening = 298.257223563 ; Conventions = "CF-1.0, wcs=http://www.opengis.net/wcs/1.1/"
where gridCRS and GridBasCRS are the tags used in WCS to characterize projects. In particular, it declares 'crs' to be both a CF convention grid_mapping, and a WCS convention gridCRS, and it adds an WCS attribute to gridCRS called GridBaseCRS, which is part of the WCS convention.
This keeps the redundancy out of CF, but allows alternate specifications to be put in a netcdf file. It also makes use of standards that other people maintain, i.e. OGC maintains the wcs namespace and list of concepts.
There are a number of reasons for doing it this way, listing of which would probably hide the essence of the proposal and should be put elsewhere. The short version is redundant representations belong to different conventions, and this is a framework for writing down the mappings between the different conventions, and for tagging datasets with them. This would allow creation of a system that could deliver the metadata in alternate representations, so that WCS applications could use WCS-style information, and CF applications could use CF information, regardless of the actual source of the data being analyzed.
Note that recent and near-term changes to netcdf make this possible -- we could not have made this choice two years ago. But netcdf version 3.6 libraries support ':' characters in attributes, and version 3.7 will support it in ncgen/ncdump, i.e. the '\:' escapes shown above will work in CDL.
Benno Blumenthal
Change History (36)
comment:1 Changed 13 years ago by caron
comment:2 Changed 13 years ago by bnl
I like this. A lot.
The escaping issue isn't obvious (to me). By pushing the escaping away from CDL are we pushing it somewhere else, and introducing a non-standard parsing scheme? (I appreciate Benno's comments against this in ticket:24 and will respond accordingly.) I guess it depends on where (in the software stack) one decodes wcs@GridBaseCRS ... but if it's not in the CF part of the stack then doing this could cause problems.
I would argue that we leave anything alone that follows the colon (which then belongs in another namespace), beyond netcdf/CDL escaping, which could (and should)be automatically dealt with in the CF stack ...
comment:3 Changed 13 years ago by jonathan
Dear Benno
Thanks for this idea. As my regular readers might expect I am more cautious about it than Bryan and John :-). I do think it is good to avoid redundancy, and to make use of other conventions where we can. However, I suspect that the situations where we could do that have quite exacting requirements. The convention we were depending on would have to be framed so that it could be used directly to supply a list of possible values for a netCDF attribute, and its intentions would have to be orthogonal to everything else in the CF convention so there weren't problems with overlaps and conflicts. Also, if it is supplying metadata that we regard as important and useful (and if it wasn't we wouldn't consider it anyway), I suppose we ought to have the same expectations of it in terms of intelligibility, clear definitions and self-descriptiveness that we do for CF metadata in general. Is that too demanding?
Cheers
Jonathan
comment:4 Changed 13 years ago by benno
Hello All,
My apologies for not responding earlier: for whatever reason, updates to this ticket are not coming through my subscription to the mailing list, so I was unaware. I suspect my mail server, but then again, I always suspect my mail server.
I share John's concern about the escaping, but I think keeping the colon (because that is what is used elsewhere) helps with clarity more than it hurts. And it is only a problem in older CDL -- the netcdf library already handles it, OPeNDAP has no problem with it. And I thought comma-separated (line-separated in Opendap) conventions in the Conventions tag were used elsewhere, but John would know best.
In response to Jonathan's comments, I would like to point out that namespaces address the metadata problem of delivering information to multiple audiences in an alternative way to the Microsoft approach of attempted domination. The example I cited above was WCS, redundant to part of CF. But alternatively, look at what Ted Habermann is doing to put Dublin Core and FGDC metadata into netcdf attributes http://galeon-wcs.jot.com/WikiHome/GALEON%20Phase2%20Main%20Page/Unidata%20OGC%20Interoperability%20Day%20Presentations/Metadata_Standards_and_netCDF_ppt___119021166608511_916828076687903?jot.downloadName=Metadata+Standards+and+netCDF.ppt. He has a convention that adds attributes like FGDC_Link, fgdc_publisher, dc_publisher, and an additional Metadata_Conventions tag "FGDC, DublinCore?". But, of course, no software understands this but his, despite FGDC and Dublin Core being widely used metadata standards. And having both a Conventions tag and a Metadata_Conventions tag is confusing, since it is all metadata. This namespace convention would put a standard machine-readable framework around such a construction, so that software could understand the conventions that the metadata belongs to, and even making the attributes a bit more human readable as well (since they would be perfectly consistent in the way they are connected to conventions).
Note that it is already possible to put attributes belonging to multiple conventions in a single netcdf file -- this proposal simply makes it possible to say which attribute belongs to which convention.
Benno
comment:5 Changed 13 years ago by benno
I would like to amend my last statement -- it did not come out the way I intended.
First of all, having a dominant standard is a great thing, the "Holy Grail" of metadata exchange, and whatever time spent that leads to such a thing is time well spent. I think it is really hard to achieve, and maintain, but the effort is essential. And data that is fully described by a widely-adopted standard will be the most universally and easily accessible.
But not all data are fully-described by standard metadata, possibly no data are (emphasis on the word "fully").
Netcdf already allows multiple conventions in a particular file (Russ supplied the reference which is at http://www.unidata.ucar.edu/software/netcdf/conventions.html). And netcdf already allows specifying a convention with a uri in the conventions tag (possibly predating namespaces in XML). I am just trying to make it clear which attribute goes with which convention in such a way that it is consistent with what the greater IT community is doing, in particular XML and RDF, formats that are used for a lot of metadata exchange.
Benno
comment:6 follow-up: ↓ 7 Changed 13 years ago by bnl
I guess I'm making the same point in other threads. But let me be direct.
- Not everything we find important or useful can be encoded in the CF convention.
- CF cannot and will not be the governing body for all the metadata folk want to put into their NetCDF files.
- CF had better make it possible for CF to "play nicely" with metadata governed by other communities.
However, in supporting that, we do run the risk of data producers contradicting themselves in their attributes between conventions. C'est la vie. Is that not better than data providers not putting as much documentation as they can with their data?
So, the only question in my mind is how we do support something like this proposal, not whether we should do it? So I find myself disagreeing with Jonathan again. (I hasten to add that just because I'm disagreeing with Jonathan a lot right now, I have no less respect for his opinions, or the work that he's done that has got us here. Precisely the contrary, because I respect him so much I want to make sure that our points of disagreement are resolved).
(And given I support doing something like this, and can't improve it technically, I obviously support this one!)
comment:7 in reply to: ↑ 6 Changed 13 years ago by jonathan
Dear Bryan, Benno et al
- Not everything we find important or useful can be encoded in the CF convention.
- CF cannot and will not be the governing body for all the metadata folk want to put into their NetCDF files.
- CF had better make it possible for CF to "play nicely" with metadata governed by other communities.
I agree with these statements and I am not opposed to the principle of making use of other conventions, which is the point of this ticket. I was expressing caution about it, especially this:
we do run the risk of data producers contradicting themselves in their attributes between conventions.
I wouldn't find that a satisfactory situation. I think that if that arises we would have to specify, as part of CF, how the conflict or overlap should be resolved. That's why I suggested that another convention should only be adopted if it was "orthogonal" to CF i.e. dealing with something that CF didn't. If that is too exacting, we ought instead to say how they should work together.
Hence, I think we should consider individual cases as specific proposals, and say explicitly in CF which other conventions can be used with CF, and how they can be used, as a result of considering them individually (in trac tickets, just like any extension to CF). It's a bit less safe because subsequently the "other" convention might develop in a way which threw up problems when used with CF, so we would have to keep an eye out for that. Would you agree with such an approach?
Cheers
Jonathan
comment:8 Changed 13 years ago by bnl
I don't think conflicted duplication is satisfactory, but it's better to be in the position of trying to resolve conflicting information than having nothing to work with ...
... and if we tried to make all other conventions orthogonal to CF we would be on a hiding to nothing, and if we tried to identify only those parts of other conventions that we like and only allow those, then we would also be on a hiding to nothing (in both cases, in workload terms).
I guess the point we really should be careful about, and maybe which is exercising you, is where within a piece of the CF convention itself, we mandate using an external attribute. That's a rather different case and each needs to be explored in it's own right. Yes, that's where this proposal originated, but this proposal itself isn't addressing that use case. I think this proposal is rather more what it says in line one:
Strictly speaking, this is not an enhancement to CF -- it is a way of encoding additional information in a netcdf files without extending CF.
and I think most of arguments about orthonality and duplication are mostly relevant to cases where
It certainly would affect discussions for extending CF, ...
comment:9 Changed 13 years ago by graybeal
This is naive, but why isn't it a simple matter to say 'in case of conflicts, any netCDF/CF specification is the winner; between external conventions, the last statement is considered the winner' and be done with it?
I think this is a useful proposal. I'm not sure how realistic it is to keep CF pure and clean with respect to conflicts that could be introduced by unsavory programmers. But I admire you for trying!
comment:10 Changed 13 years ago by caron
A technical point about using the ":" in the attribute name. The only real problem with this is that older software might have problems with it, since technically the ":" is not part of the set of allowable chars (yet). It does have to be escaped in the CDL, but CDL is just a representation; theres no problem in the file format itself. So if the ":" in analogue to XML namespaces is more important than possible trouble with old code, im ok with it.
With regard to possible conflicts, in my opinion theres nothing to be done about it. The point/effect of this proposal is to allow arbitrary other semantics. I would just say, from the CF POV, something like "all CF semantics have to be correct and consistent, and shall not be modified by non-CF attributes".
comment:11 Changed 13 years ago by caron
Oh one more point: its conceivable that CF itself might in the future want to use this mechanism, in which case we would claim a namespace (and probably a prefix) and then I would agree with all of Jonathon's concerns and requirements for consistency, clarity, lack of conflicts, etc. for anything in that namespace.
comment:12 follow-ups: ↓ 13 ↓ 14 ↓ 15 Changed 13 years ago by caron
Ok, if you insist, maybe a few more thoughts.
- I find this a bit cleaner:
Conventions = "CF-1.0"; namespaces ="wcs=http://www.opengis.net/wcs/1.1/"
than:
Conventions = "CF-1.0, wcs=http://www.opengis.net/wcs/1.1/"
- I find myself thinking about using a cf namespace for cf attributes, which I think adds a lot of readability:
float temp(lat, lon) ; long_name = "temperature" ; units = "K" ; cf:grid_mapping = "crs"; wcs:gridCRS = "crs" ;
(note i am using modified CDL, by not including the variable prefix and not escaping the namespace ":" ).
If this proposal is accepted, i would propose that a cf namespace be established for all existing cf attributes, optional use of course, so that
cf:grid_mapping = "crs"
and
:grid_mapping = "crs"
are equivalent.
- Benno, I assume that these proposed namespaces are the same as XML namespaces? Eg, tools match on the namespace URI, and the prefix is arbitrary? Or is it intended to be something else, like URN namespaces (which I know little about) ?
comment:13 in reply to: ↑ 12 Changed 13 years ago by benno
Replying to caron:
John is again precisely on topic, and brings up some excellent points. My short response is that these are precisely the same as XML namespaces (which are the same as RDF namespaces), and yes, there needs to be a CF namespace as well. Since I think Conventions and namespaces are representing exactly the same thing in this context (a set of attributes/terms that are used to describe things in a defined context), I originally proposed continuing to use Conventions (you might want to use namespaces for something else, see below).
But let me explain.
There were a bunch of details I left out of the original proposal, because I thought the details would obscure the essence of the thing, and lose most of the audience, and kill the proposal. But John is quite right, the goal is to be exactly analogous to XML namespaces, mainly so that there can be a clean general semantic translation from netcdf to RDF/XML and back.
My starting point is two sections of the netcdf documentation talking about the Conventions tag. First is http://www.unidata.ucar.edu/software/netcdf/conventions.html, which states,
If present in a netCDF file, `Conventions' is a global attribute that is a character array for the name of the conventions followed by the file. Originally, these conventions were named by a string that was interpreted as a directory name relative to the directory /pub/netcdf/Conventions/ on the host ftp.unidata.ucar.edu.
This web page is now the preferred and authoritative location for registering a URI reference to a set of conventions maintained elsewhere. The FTP site will be preserved for compatibility with existing references, but authors of new conventions should submit a request to support-netcdf@… for listing on this page.
It may be convenient for defining institutions and groups to use a hierarchical structure for general conventions and more specialized conventions. For example, if a group named XXX agrees upon a set of conventions for required attributes, attribute names, and netCDF representations for certain discipline-specific data structures, they may describing the agreed-upon conventions in a document associated with the name "XXX", and files that followed these conventions would contain the global attribute
:Conventions = "XXX" ;Later, if another group agrees upon some additional conventions for a specific subset of XXX data, for example time series data, the description of the additional conventions might be associated with the name "XXX/Time_series", and files that adhered to these additional conventions would use the global attribute
:Conventions = "XXX/Time_series" ;It is possible for a netCDF file to adhere to more than one set of conventions, even when there is no inheritance relationship among the conventions. In this case, the value of the `Conventions' attribute may be a single text string containing a list of the convention names, separated by blank space or commas, such as
:Conventions = "XXX, YYY" ;
So netcdf as it currently stands uses a comma-separated list in the Conventions tag for multiple conventions.
The user guide has some additional important detail (http://www.unidata.ucar.edu/software/netcdf/guidef/guidef-13.html).
If present, 'Conventions' is a global attribute that is a character array for the name of the conventions followed by the dataset, in the form of a string that is interpreted as a directory name relative to a directory that is a repository of documents describing sets of discipline-specific conventions. This permits a hierarchical structure for conventions and provides a place where descriptions and examples of the conventions may be maintained by the defining institutions and groups. The conventions directory name is currently interpreted relative to the directory pub/netcdf/Conventions/ on the host machine ftp.unidata.ucar.edu. Alternatively, a full URL specification may be used to name a WWW site where documents that describe the conventions are maintained.
For me, this is really important. First of all, it already allows a URL to specify the convention, so specifying an XML namespace here is already supported. Secondly, it specifies how to construct an URI for a convention that is not specified as a URL, namely the convention CF-1.0 has the URI ftp://ftp.unidata.ucar.edu/pub/netcdf/Conventions/CF-1.0/. More particularly, if one had an attribute called attributename it would have a uri ftp://ftp.unidata.ucar.edu/pub/netcdf/Conventions/CF-1.0/attributename. This would let us make statements about this attribute in XML or RDF.
Unfortunately, I don't think that is exactly the URI we want to use to represent CF, and we (the netcdf community) have strayed from that ftp directory to a web page which does not make it clear (to me) how to construct a URI to represent a netcdf convention. But we could fix this, right? Some kind of netcdf convention registry that is explicit/machine readable.
As for this proposal, it changes the current netcdf Conventions attribute value by allowing prefixes so that the attributes can be explicitly attached to their conventions. The convention without a prefix tag would contain all the unprefixed attributes.
So John is quite right, the prefixes are determined by the Conventions tag. One of the beauties of this scheme (and why XML adopted it), is that software that does not know about namespaces just sees a list of conventions, and a whole bunch of attributes, which is pretty much the situation that the software was in originally, particularly if CF is used as the default (unprefixed) convention.
There is a subtlety in this: these namespaces only apply to attribute names, not to attribute values. This is precisely the same as a RDF/XML file: the namespace prefixes are only used on the attributes, not on the URIs that identify objects (though you can define XML entities for that). The one exception to that statement is that the base uri is used to abbreviate URIs for objects local to the file. Which is not to say that you cannot define a namespace that would apply to attribute values, but that is not what the Conventions tag is about, and it is not what this proposal is about.
I can't overemphasize how important I think this is. But I am afraid if I start talking about all the wonderful things RDF would let us write down, and how much it would help CF/netcdf, I will get a large off-topic conversation attached to this proposal, which would not help. This proposal is a first step towards clean metadata exchange between netcdf and XML/RDF. Establishing URI's is the second.
Thanks again, John,
Benno
P.S. Is your 'modified CDL' currently implemented? It certainly is clear.
comment:14 in reply to: ↑ 12 Changed 13 years ago by pbentley
Replying to caron:
- I find this a bit cleaner:
Conventions = "CF-1.0"; namespaces ="wcs=http://www.opengis.net/wcs/1.1/"
than:
Conventions = "CF-1.0, wcs=http://www.opengis.net/wcs/1.1/"
The possibility of using XML-type namespaces in netCDF was also on my mind when I was formulating tickets 9 and 18. Like you, John, I'm not too keen on the idea of overloading the Conventions attribute to encode namespace declarations. Instead, I'd envisaged using a separate netCDF variable specifically for this purpose, e.g.
char namespace_declarations ; :cf = "http://www.cfconventions.org/" ; :wcs = "http://www.opengis.net/wcs/1.1/" ; :dc = "http://www.dublincore.org/" ; ...
Which is not too dissimilar from the way we store CRS metadata in a separate grid mapping variable. However, I also like your suggestion of encoding namespace declarations in a single global attribute, though I think this may be hard for human-readers to decode if there are more than a couple of namespaces. Obviously this is not an issue if it is only ever parsed by software.
Even though I have some reservations about the way in which we're proposing to crowbar XML-style techniques into netCDF's simple name-value metadata mechanism, I realise that this is what we have to work with for the time being. Hence this proposal would get my vote in principle.
Regards,
Phil
comment:15 in reply to: ↑ 12 Changed 13 years ago by benno
Replying to caron:
- I find this a bit cleaner:
Conventions = "CF-1.0"; namespaces ="wcs=http://www.opengis.net/wcs/1.1/"
than:
Conventions = "CF-1.0, wcs=http://www.opengis.net/wcs/1.1/"
I've been pondering "cleaner" for some time, as well as Phil's idea phil , which reverses the usual role of attributes and variables. I would put both these ideas in the class of "don't mess with Conventions", and I have been thinking about implementation reasons why one should leave Conventions along. Before I give those reasons, I'd like to add another possibility for "don't mess with Conventions" schemes: simply borrow xml namespaces literally, i.e. an xmlns attribute sets the default namespace for the container it is in, and xmlns:xyz attribute sets the namespace for prefix xyz:. So to continue this example, we would say
xmlns:wcs = "http://www.opengis.net/wcs/1.1/";
Now suppose for the moment that we have all decided on a URI for CF-1.0. It could be a constructed URI, i.e. there is a simple rule for constructing it that we all agree to, i.e. ftp://ftp.unidata.ucar.edu/pub/netcdf/Conventions/CF-1.0/. On the other hand, it could be a registry URI, meaning that we have agreed on a registry for Convention URIs, e.g. urn:cfns:cf-1.0: or whatever.
This proposal was that
Conventions = "CF-1.0" ;
would set the default namespace, i.e. the CF1.0 URI. Equivalently, we could say
xmlns = "urn:cfns:cf-1.0:" ;
However
- that is not what Conventions means at present (Conventions="CF-1.0" current meaning is that some of the attributes might belong to the CF-1.0 convention)
- If we do use a registry for setting Convention-equivalent URIs, then any code that does the tranlation to prefixed attributes has to consult the registry.
So as far as the default namespace is concerned, we have for a CF-1.0 statement
- current Conventions, which has some of the attributes belonging to CF-1.0
- proposed Conventions, which has all the unprefixed attributes belonging to CF-1.0, and,
- xmlns, which explicitly states the URI for CF-1.0 that we have all agreed to.
From an implementation point of view, unless we agree on a construction method for Conventions to URIs, xmlns is very different from the proposed Conventions in that no external lookup is required to convent to explicit URIs.
As proposed, i.e. using Conventions to set the default URI, prevents us from writing clever programs that use the CF standard to figure which attributes in the file actually belong to CF-1.0 when the Conventions tag says the attributes are CF-1.0 (because the new standard says put them all in CF-1.0 because it is the default namespace). It is not clear whether this is a major issue or not, but it might be.
John's namespaces tag could also handle the default namespace, but he might consider the proposal sullied at this point (or at least less than clean) ...
Another reason one might want to not use Conventions is that one might want to tag variable names (Conventions being explicitly about attributes).
One more can-of-worms is if one would like to set name spaces for the values of attributes, i.e. to conveniently point to a concept. This requires knowing which attribute values are URI's and not plain strings -- possible to do in opendap because it has a URL type, but not if one started with a netcdf file, i.e. one needs to be careful to translate to URL from string when appropriate, but not possible in netcdf unless you
- want to say if it looks like a URI it is a URI, or
- are willing to look at the description (i.e. ontology) of the standard and thus deduce that it is an URI. Which would be external information, thus relatively messy.
Benno
comment:16 Changed 12 years ago by edavis
- Owner changed from cf-conventions@… to edavis
- Status changed from new to assigned
Hi all,
I've volunteered to moderate this ticket and it seems a summary is in order. Hopefully, we can come to some agreement on the open issues. Below is my summary.
Thanks,
Ethan
Proposal
Extend CF to allow tagging of attribute names with namespace tags (similar to XML Namespace). This will allow data providers to use multiple attribute conventions without having to worry about attribute name clashes. It will also allow CF to use other attribute conventions without duplicating them directly in CF.
There seems to be general agreement that attribute namepaces would work in a manner similar to XML Namespaces. Something like:
- A namespace is identified by a URI.
- A namespace prefix/tag is associated with a namespace.
- Prefixing an attribute name with a namespace prefix/tag and a colon (':') places that attribute name in the associated namespace. [Some initial concern over need to escape ":" in CDL but seems OK now.]
- Not prefixing an attribute name with a namespace prefix/tag places that attribute name in the empty/default namespace.
Issues
There are a number of issues for which there is not general agreement and more discussion is needed:
- Should the proposal maintain backward compatibility?
- How will the namespaces used in a netCDF file be declared/encoded? How will the empty/default namespace be used?
- How will namespace URIs be constructed?
- How will potential semantic conflict/overlap be handled?
- How would non-attribute based conventions be mapped into an attribute list?
And there are a few issues that we might want to push into new tickets:
- Namespaces for variable names.
- Namespaces for attribute values (e.g., common concept?).
Backward Compatibility
One of the main goals of this proposal is to allow the use of external conventions without having attribute name clash issues. This goal can be solved without affecting backward compatibility by requiring that CF attributes always be placed in the empty/default namespace (i.e., without a prefix/tag).
On the other hand, (as Benno points out) this would "prevent us from writing clever programs that use [CF] to figure [out] which attributes ... actually belong to CF".
If the proposal does not include restrictions for backward compatibility, data sets can still be written that use namespaces but maintain backward compatibility by placing all CF attributes in the empty/default namespace. This could be spelled out as best practice until namespaces are more broadly supported.
Two questions to consider if the proposal does not include restrictions for backward compatibility:
- How will namespaces affect attributes defined by NUG ("units" and "long_name" e.g.), COARDS, etc?
- Should some level of backward compatibility be required if CF attributes are in an explicitly declared namespace?
The backward compatibility issue seems somewhat key to the discussion as a decision on this matter will affect a number of other decisions.
Declaring/encoding namespaces
Four different options where discussed for declaring namespaces in a netCDF file (and a fifth specific to a reserved CF namespace prefix). Some of these options implicitly left CF in the empty/default namespace, one explicitly placed CF in the empty/default namespace, and another explicitly placed CF in the CF namespace. However, the issue of backward compatibility was not always part of the discussion.
1) Benno's initial proposal extended the meaning of the "Conventions" global attribute to include namespace declarations:
Conventions = "CF-1.0, wcs=http://www.opengis.net/wcs/1.1";
There was further discussion over the current meaning of the "Conventions" attribute and that the NUG associates it with URLs. However, the current meaning is that some attributes MAY be from the listed convention(s). Using values in the "Conventions" attribute for namespace declarations would be an extension of its current meaning and might cause backward compatibility issues.
2) John Caron proposed a new "namespaces" global attribute:
Conventions = "CF-1.0"; namespaces = "wcs=http://www.opengis.net/wcs/1.1";
3) Phil proposed a namespace container variable:
char namespace_declarations; :cf = "http://www.cfconventions.org/"; :wcs = "http://www.opengis.net/wcs/1.1"; :dc = "http://www.dublincore.org/"; ...
Which has readability advantages when declaring multiple namespaces.
4) Benno then suggested using the form of namespace declaration from XML Namespace:
xmlns = "urn:cfns:cf-1.4:" ; // Defines the default namespace. xmlns:wcs = "http://www.opengis.net/wcs/1.1";
Which completely removes the "Conventions" attribute from the namespace issue.
5) Another suggestion related to namespaces and backward compatibility involved explicitly reserving the "cf:" prefix and the empty/default namespace to be equivalent, e.g., "cf:grid_mapping" and "grid_mapping" are equivalent. Though it wasn't clear whether all attributes in the empty/default namespace would be required to be CF.
Possible semantic conflict/overlap
There was quite a bit of discussion about how to deal with any conflict or overlap in the semantics of different conventions. Several options where mentioned for handling possible semantic conflict/overlap with external conventions:
1) CF keeps a controlled list of allowed external conventions making sure there are no conflicts/overlaps between CF and all conventions on the controlled list.
[NOTE: There was broad agreement that this would require too much work for CF and would reduce the flexibility for data providers targeted by this proposal. Specifying how conflict/overlap should be handled by CF clients and data providers seemed a satisfactory and required alternative.]
2) CF semantics must be consistent and overrule external semantics. Add a statement to the specification something like:
"all CF semantics must be correct and consistent, and shall not be modified by non-CF attributes"
[Can compliance be tested? Strong restriction on client, not as much on data provider.]
3) CF semantics must be consistent. Add a statement to the specification something like:
"all CF semantics must be correct and consistent"
Where do namespace URIs come from?
There were a number of suggestions on where the namespace URIs would come from. These included:
- Using the "Conventions" global attribute and the NUG URL discussion to define namespace URIs.
- Having CF keep a list of allowed namespaces and their URIs. [NOTE: See the note in item 1 on semantic conflict/overlap.]
- Allowing external convention writers to control the namespace URI for their conventions.
In the end, I think there was general agreement 1) not to extend the meaning of the "Conventions" attribute and 2) that it would be too much work for CF to maintain a list of allowed namespaces (and their URIs). This leaves the option of having external convention writers control their namespace URI. [Which is in-line with how XML Namespaces work.]
Mapping non-attribute based conventions into attribute lists
Somewhat related to the semantic conflict overlap issue ...
How do external, non-attribute based conventions get mapped into an attribute list. E.g., OGC WCS concepts are defined in UML (and XML Schema) which don't easily translate into a simple list of attributes.
Do we need to suggest that only well-defined attribute conventions be used in CF-netCDF files?
comment:17 follow-ups: ↓ 18 ↓ 19 Changed 12 years ago by caron
Id like to get this proposal finished if possible, and use it in my point obs proposal. FWIW, here is my vote on resolving the outstanding issues:
- Backwards Compatibility
Use of CF: prefix is optional but recommended for all attributes in the CF Appendix A and F. For data providers concerned about breaking existing software, it is recommended to add both CF: and non-prefixed attributes for existing attributes, and only CF: for new attributes added after this proposal is accepted. Client software is strongly urged to recognize CF: prefixes. CF web pages should have a place for describing client software and its conformance to this and other CF conventions.
- Declaring/encoding namespaces
While Phil's proposal is probably the most readable:
char namespace_declarations; :C"; :wcs = "http://www.opengis.net/wcs/1.1"; :dc = "http://www.dublincore.org/"; ...
but im not sure how we find this variable, is "namespace_declarations" now reserved?
Im not all that attached, but i slightly prefer:
attributes: CF:namespaces = "CF=http://www.cfconventions.org/", "wcs=http://www.opengis.net/wcs/1.1", "dc=http://www.dublincore.org/";
using a global attribute whose value is an array of strings.
In any case, each namespace declaration string must be of the form 'prefix=namespace'. it is recommended that the namespace resolve to a human-readable web page for that vocabulary.
- Possible semantic conflict
"all CF semantics must be correct and consistent, and shall not be modified by non-CF attributes"
- Where do namespace URIs come from?
The CF namespace definition MUST be exactly the string:
"CF=http://www.cfconventions.org/";
(or if you prefer Phil's proposal:)
:CF= "http://www.cfconventions.org/";
all other namespaces and their semantics are not controlled by CF, except that the prefix CF (case-insensitive) is not allowed to be used by any other namespace.
Given that the CF prefix is reserved and its namespace is fixed, the namespaces declaration is optional for the use of the CF: prefix.
- Mapping non-attribute based conventions into attribute lists
out of scope for this proposal.
So how about on Oct 26, comments/votes are completed and ethan and/or benno writes up a revised proposal based on this input, and we accept or reject that revised proposal as written ? All in favor, please stay quiet ;)
comment:18 in reply to: ↑ 17 ; follow-up: ↓ 20 Changed 12 years ago by jonathan
Dear John
Thanks for moving this on. I have some comments on your proposal.
Use of CF: prefix is optional but recommended for all attributes in the CF Appendix A and F. For data providers concerned about breaking existing software, it is recommended to add both CF: and non-prefixed attributes for existing attributes, and only CF: for new attributes added after this proposal is accepted.
I think that the prefix should be recommended only if the dataset uses more than one convention. If it is a pure CF dataset, as many will continue to be (e.g. CMIP5 datasets created by CMOR, I expect), adding a CF prefix is an unnecessary complication. If the dataset is created as a pure CF dataset and subsequent you want to use another one too, then you'll be modifying the dataset anyway and can rename the attributes to include the prefix at the same time.
We should be clear in the CF standard about whether we are recommending duplicates (prefixed and non-prefixed), rather than leaving it to the data-writer's conscience. Recommendations are not mandatory anyway.
The prefix is not necessary for the NUG attributes in App A, I would say. Wouldn't all netCDF conventions recognise those anyway? They are not the property of CF. It is also not necessary for App F attributes, because these ones can only be attributes of a grid_mapping variable, which is specifically a CF convention.
- Declaring/encoding namespaces
CF:namespaces = "CF=http://www.cfconventions.org/", "wcs=http://www.opengis.net/wcs/1.1", "dc=http://www.dublincore.org/";
using a global attribute whose value is an array of strings.
Should that say :CF:namespaces instead of CF:namespaces if it's a global attribute? I agree, a global attribute is a good solution. I think that an attribute in the CF spirit would be a blank-separated list of words, rather than an array of strings. (Can attributes really be vectors of strings anyway? I thought that attributes could have only one dimension.) I would suggest the syntax "namespace URL [ namespace URL ... ]" e.g.
:CF:namespaces="CF http://www.cfconventions.org/ wcs http://www.opengis.net/wcs/1.1 dc http://www.dublincore.org/"
I suggest that would be more robust, because it won't be upset by including extra spaces and it'll be simpler to parse, not depending on = being included.
What about the Conventions attribute? Will the dataset continue to identify itself just as CF-1.4, for instance? The CF:namespaces attribute implies that multiple conventions are used.
- Possible semantic conflict
"all CF semantics must be correct and consistent, and shall not be modified by non-CF attributes"
OK. Does that mean that, if the file contains both CF metadata and metadata according to another convention, and they have conflicting meanings, the application should take the CF metadata as correct?
the prefix CF (case-insensitive)
Do you mean that the attributes could be prefixed by either "CF:" or "cf:"? That makes more work for the software reading the data, since it would have to check both. What if both are provided and they disagree? I would suggest that it would be better to decide on the case.
So how about on Oct 26, comments/votes are completed and ethan and/or benno writes up a revised proposal based on this input, and we accept or reject that revised proposal as written ?
The procedure is that we express views after the proposal has been written. The proposal should specify in detail the alterations to be made to the text of the CF standard and the conformance document (since there's no-one else available to do this work other than proposers). Then, as usual, the moderator (Ethan) decides after three weeks after the last comment made whether the proposal is accepted. See http://cf-pcmdi.llnl.gov/governance/governance-rules
Best wishes
Jonathan
comment:19 in reply to: ↑ 17 ; follow-up: ↓ 21 Changed 12 years ago by pbentley
Replying to caron:
Hi John,
- Declaring/encoding namespaces
While Phil's proposal is probably the most readable:
char namespace_declarations; :CF = "http://www.cfconventions.org/"; :wcs = "http://www.opengis.net/wcs/1.1"; :dc = "http://www.dublincore.org/"; ...but im not sure how we find this variable, is "namespace_declarations" now reserved?
I merely used that name as an example. Presumably "namespaces" or "cf_namespaces" could serve equally well as the variable name here. The 'reserved-ness' would be the same as for other CF names, I presume - purely a matter of agreed convention.
As you say, the main motivation for suggesting this particular idiom was visual parsing. I guess the choice of encoding style (variable vs attribute) may be influenced by what we perceive to be the typical programmatic access patterns.
By way of a contrived example, if a programmer wanted to detect if a particular namespace has been assigned and, if so, what is its value, then with the variable idiom this could be achieved simply (I think - correct me if I'm wrong) using the existing netcdf functions nc_inq_att() and nc_get_att_text(). Using the namespace idiom, however, one would need to use the latter function to get the attribute text, then use a locally-written function to parse the text and extract the required namespace URI, if it's there.
Sure, writing such a parsing function is clearly not one of The Grand Programming Challenges, especially not if you use Python ;-) But I guess it does mean that this noddy function will end up being written a thousand ways in a thousand different places. Heck, maybe that's not such a big deal.
That said, however, ...
I'm not all that attached, but I slightly prefer:
attributes: :CF:namespaces = "CF=http://www.cfconventions.org/", "wcs=http://www.opengis.net/wcs/1.1", "dc=http://www.dublincore.org/";using a global attribute whose value is an array of strings.
... I'd agree that this does sit better with the 'CF nature'. It also means that in ncdump -h output any namespace declarations would appear near the top of the output, which mirrors the way XML namespaces (typically) appear at the top of an XML document, which in turn is the representational model that many of us have in mind in this context, I suspect.
So, with my CF hat on I'd vote for the global attribute idiom, but with my programmer and end-user hat on I'd vote for the variable idiom! Give me both, please :-) :-)
Regards, Phil
comment:20 in reply to: ↑ 18 Changed 12 years ago by caron
Replying to jonathan:
Dear John
Thanks for moving this on. I have some comments on your proposal.
Use of CF: prefix is optional but recommended for all attributes in the CF Appendix A and F. For data providers concerned about breaking existing software, it is recommended to add both CF: and non-prefixed attributes for existing attributes, and only CF: for new attributes added after this proposal is accepted.
I think that the prefix should be recommended only if the dataset uses more than one convention. If it is a pure CF dataset, as many will continue to be (e.g. CMIP5 datasets created by CMOR, I expect), adding a CF prefix is an unnecessary complication. If the dataset is created as a pure CF dataset and subsequent you want to use another one too, then you'll be modifying the dataset anyway and can rename the attributes to include the prefix at the same time.
I think thats a valid use case. I was thinking of the problem of distinguishing CF and non-CF attributes. Its hard for non-experts to know what they're looking at when they look at a file's metadata. Having a CF prefix unambiguously says "go look up what this attribute means in CF conventions". So I would prefer to recommend that usage, since many providers will follow those recommendations.
We should be clear in the CF standard about whether we are recommending duplicates (prefixed and non-prefixed), rather than leaving it to the data-writer's conscience. Recommendations are not mandatory anyway.
Yes, I was suggesting duplicates. i dont see any other way to provide backwards compatability.
The prefix is not necessary for the NUG attributes in App A, I would say. Wouldn't all netCDF conventions recognise those anyway? They are not the property of CF. It is also not necessary for App F attributes, because these ones can only be attributes of a grid_mapping variable, which is specifically a CF convention.
Thats a good point, I guess we need to clarify what attributes are owned by CF. OTOH, I dont see a lot of harm for CF to take ownership of CF: prefixed attributes, even if they duplicate NUG ones.
- Declaring/encoding namespaces
CF:namespaces = "CF=http://www.cfconventions.org/", "wcs=http://www.opengis.net/wcs/1.1", "dc=http://www.dublincore.org/";
using a global attribute whose value is an array of strings.
Should that say :CF:namespaces instead of CF:namespaces if it's a global attribute?
yes
I agree, a global attribute is a good solution. I think that an attribute in the CF spirit would be a blank-separated list of words, rather than an array of strings. (Can attributes really be vectors of strings anyway? I thought that attributes could have only one dimension.)
I guess arrays of strings are a new feature in netcdf-4, so we better stick with the blank separated single strings.
I would suggest the syntax "namespace URL [ namespace URL ... ]" e.g.
:CF:namespaces="CF http://www.cfconventions.org/ wcs http://www.opengis.net/wcs/1.1 dc http://www.dublincore.org/"I suggest that would be more robust, because it won't be upset by including extra spaces and it'll be simpler to parse, not depending on = being included.
ok with me, thats the way XML does it, though its not that readable. perhaps we could also allow newlines to seperate "prefix namespace" pairs for readability? BTW, in XML I think its "prefix namespace" not "namespace URL".
What about the Conventions attribute? Will the dataset continue to identify itself just as CF-1.4, for instance? The CF:namespaces attribute implies that multiple conventions are used.
thats a good point, it does seem we have some overlap there. In a way the CF:namespaces attribute is a generalization of the Conventions attribute.
- Possible semantic conflict
"all CF semantics must be correct and consistent, and shall not be modified by non-CF attributes"
OK. Does that mean that, if the file contains both CF metadata and metadata according to another convention, and they have conflicting meanings, the application should take the CF metadata as correct?
yes
the prefix CF (case-insensitive)
Do you mean that the attributes could be prefixed by either "CF:" or "cf:"? That makes more work for the software reading the data, since it would have to check both. What if both are provided and they disagree? I would suggest that it would be better to decide on the case.
yes, lets just decide on one. My intention is to prevent other namespaces from using the same prefix with different case.
So how about on Oct 26, comments/votes are completed and ethan and/or benno writes up a revised proposal based on this input, and we accept or reject that revised proposal as written ?
The procedure is that we express views after the proposal has been written. The proposal should specify in detail the alterations to be made to the text of the CF standard and the conformance document (since there's no-one else available to do this work other than proposers). Then, as usual, the moderator (Ethan) decides after three weeks after the last comment made whether the proposal is accepted. See http://cf-pcmdi.llnl.gov/governance/governance-rules
agree, i just mean to encourage anyone's comments in the next two weeks as a deadline for a new version of the proposal
Best wishes
Jonathan
thanks for your insightful comments, as usual.
comment:21 in reply to: ↑ 19 Changed 12 years ago by caron
Replying to pbentley:
Replying to caron:
Hi John,
- Declaring/encoding namespaces
While Phil's proposal is probably the most readable:
char namespace_declarations; :CF = "http://www.cfconventions.org/"; :wcs = "http://www.opengis.net/wcs/1.1"; :dc = "http://www.dublincore.org/"; ...but im not sure how we find this variable, is "namespace_declarations" now reserved?
I merely used that name as an example. Presumably "namespaces" or "cf_namespaces" could serve equally well as the variable name here. The 'reserved-ness' would be the same as for other CF names, I presume - purely a matter of agreed convention.
As you say, the main motivation for suggesting this particular idiom was visual parsing. I guess the choice of encoding style (variable vs attribute) may be influenced by what we perceive to be the typical programmatic access patterns.
By way of a contrived example, if a programmer wanted to detect if a particular namespace has been assigned and, if so, what is its value, then with the variable idiom this could be achieved simply (I think - correct me if I'm wrong) using the existing netcdf functions nc_inq_att() and nc_get_att_text(). Using the namespace idiom, however, one would need to use the latter function to get the attribute text, then use a locally-written function to parse the text and extract the required namespace URI, if it's there.
Sure, writing such a parsing function is clearly not one of The Grand Programming Challenges, especially not if you use Python ;-) But I guess it does mean that this noddy function will end up being written a thousand ways in a thousand different places. Heck, maybe that's not such a big deal.
That said, however, ...
I'm not all that attached, but I slightly prefer:
attributes: :CF:namespaces = "CF=http://www.cfconventions.org/", "wcs=http://www.opengis.net/wcs/1.1", "dc=http://www.dublincore.org/";using a global attribute whose value is an array of strings.
... I'd agree that this does sit better with the 'CF nature'. It also means that in ncdump -h output any namespace declarations would appear near the top of the output, which mirrors the way XML namespaces (typically) appear at the top of an XML document, which in turn is the representational model that many of us have in mind in this context, I suspect.
So, with my CF hat on I'd vote for the global attribute idiom, but with my programmer and end-user hat on I'd vote for the variable idiom! Give me both, please :-) :-)
Regards, Phil
We could also add parsing namespaces to cflib, so the user doesnt care how we end up encoding it.
comment:22 follow-up: ↓ 23 Changed 12 years ago by benno
Hello All,
At this point I would like to point out the larger context that this proposal fits into, in the hope that it will clarify the criteria for choosing the best solution.
This proposal was always intended to facilitate the free exchange of information between netcdf and RDF/OWL, and with the experience I have gained in the subsequent year, exchange of information with conventions expressed in XML Schema as well.
XML Schema are important because a lot of data exchange conventions are expressed in XML Schema. RDF/OWL is important because it is a framework where multiple conventions can co-exist, in contrast to the focus of XML Schema where only one convention is meaningful at a time. And data exchange will occur in an ontological framework -- it is the only feasible solution on the horizon to the exploding chaos of data frameworks that are being built.
Nathan Potter and I and others have been working on an OPeNDAP project where we insert OGC metadata into NcML, and move it to the ddx (i.e. it can be exchanged between DAP servers), and move it from the ddx into rdf (i.e. it is in RDF/OWL). We also read in XML Schema so that we 1) convert multiple schema in OWL so that they coexist in a single framework, and 2) we can extract XML documents consistent with their schema based on the information in the RDF/OWL framework. It is not quite finished yet, and Nathan is travelling for quite some time, but it is pretty close.
Note that the part that we have not done is move the information from NcML to netcdf, an important step, and where this proposal fits in. But we do have a sense of what things should look like in NcML or the ddx or RDF/OWL to be usable, and I think that is an important part of formulating this proposal.
So what do I think I have learned on my CF-trac#27 vacation?
1) if one wants to insert external information into a document controlled by an XML Schema, one is probably going to wrap it in a special container that says "this is where stuff not in the schema goes". That block is naturally instrumented with the namespaces for the external information.
2) XML is not as clean as one would hope. In particular, in XML there is no standard for converting a (namespace,qname) pair into a Universal Resource Identifier (URI); XML is written exclusively in terms of the pair. In XML Schema, the convention is to insert a '#' between the namespace and qname to create the URI. In RDF/XML, the convention is to simply concatenate the two with no intervening character. So to keep the Universal in URI, one has to convert namespace strings in moving information around, and be explicit about how one creates URIs from namespace,qname pairs in the current context.
As things currently stand, we are using a special attribute otherXML which holds XML governed by some (unspecified) XML Schema.
In NcML, this could appear as
<variable name="u">
<attribute name="nameNotUsed" type="OtherXML">
<ows:Abstract xmlns:ows="http://www.opengis.net/ows/1.1">Northward component of a 2D sea surface velocity vector.</ows:Abstract> <Definition xmlns="http://www.opengis.net/wcs/1.1"
xmlns:ows="http://www.opengis.net/ows/1.1"
<ows:AnyValue/>
</Definition> <owcs:InterpolationMethods xmlns:owcs="http://www.opengis.net/wcs/1.1/ows" >
<owcs:DefaultMethod>nearest</owcs:DefaultMethod>
</owcs:InterpolationMethods>
</attribute> <attribute name="standard_name" type="String" >surface_eastward_sea_water_velocity</attribute>
In the ddx, this appears as
<Grid name="u">
<Attribute name="nameNotUsed" type="OtherXML">
<ows:Abstract xmlns:ows="http://www.opengis.net/ows/1.1">Northward component of a 2D sea surface velocity vector.</ows:Abstract> <Definition xmlns="http://www.opengis.net/wcs/1.1"
xmlns:ows="http://www.opengis.net/ows/1.1"
<ows:AnyValue/>
</Definition> <owcs:InterpolationMethods xmlns:owcs="http://www.opengis.net/wcs/1.1/ows" >
<owcs:DefaultMethod>nearest</owcs:DefaultMethod>
</owcs:InterpolationMethods>
</Attribute> <Attribute name="standard_name" type="String">
<value>surface_eastward_sea_water_velocity</value>
</Attribute>
And this is translated to RDF/XML as
<dap:Grid rdf:ID="u">
<ows:Abstract xmlns:ows="http://www.opengis.net/ows/1.1#"
xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns="http://xml.opendap.org/ns/DAP/3.3#">Eastward component of a 2D sea surface velocity vector.</ows:Abstract>
<Definition xmlns="http://www.opengis.net/wcs/1.1" xmlns:ows="http://www.opengis.net/ows/1.1#"
xmlns:grddl="http://www.w3.org/2003/g/data-view#" rdf:parseType="Resource" >
<ows:AnyValue rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#nil" />
</Definition> <owcs:InterpolationMethods xmlns:owcs="http://www.opengis.net/wcs/1.1/ows#"
xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns="http://xml.opendap.org/ns/DAP/3.3#" rdf:parseType="Resource">
<owcs:DefaultMethod>nearest</owcs:DefaultMethod>
</owcs:InterpolationMethods> <att:standard_name xmlns:att="http://localhost:8080/opendap/coverage/200803061600_HFRadar_USEGC_6km_rtv_SIO.nc.ddx/att#"
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">surface_eastward_sea_water_velocity</att:standard_name>
This solution is not perfect -- we introduced a new datatype OtherXML, we are not using the attribute name, we do not reference the Schema used in the inserted XML so there is no explicit way of finding the framework that explains how to use the inserted metadata. We have assumed that the inserted XML is described by an XML Schema, we cannot, for example, use it to insert RDF/XML directly. John C. may for example, already have an element in NcML that would be better suited to the purpose of inserting external XML, and we could map that to the ddx instead.
Again, the point is not to force the particular solution, but that we neede to address the problem of moving information cleanly between the different systems. Whatever CF/netcdf chooses to represent multiple conventions, it needs to map cleanly into other systems in general, and NcML/DDX OWL/RDF in particular. I hope we can add this to the conversation, and devise a solution that will be implemented cleanly to interconnect these essential frameworks.
comment:23 in reply to: ↑ 22 ; follow-up: ↓ 24 Changed 12 years ago by caron
Naively translating your XML into CDL, I would guess:
attributes: :CF\:namespaces = "ows http://www.opengis.net/ows/1.1 owcs http://www.opengis.net/wcs/1.1/ows"; variables: float u(x,y); u:ows\:Abstract = "Northward component of a 2D sea surface velocity vector"; u:owcs\:InterpolationMethods= "nearest";
what are the issues with this form?
comment:24 in reply to: ↑ 23 ; follow-up: ↓ 25 Changed 12 years ago by benno
Replying to caron:
Naively translating your XML into CDL, I would guess:
attributes: :CF\:namespaces = "ows http://www.opengis.net/ows/1.1 owcs http://www.opengis.net/wcs/1.1/ows"; variables: float u(x,y); u:ows\:Abstract = "Northward component of a 2D sea surface velocity vector"; u:owcs\:InterpolationMethods= "nearest";what are the issues with this form?
My point was that it is not a solution until it is translated cleanly into NCML/DAP.ddx/ RDF/OWL. Code as it currently exists would not express this as namespaces/qnames in XML versions of the netcdf information -- it would be buried as attribute elements in NcML/DDX.
Are you committed to cleanly representing this in NcML so that the information can be recognized as XML elements by standard XML tools? Does it cleanly represent in the DDX? Can I translate it cleanly into RDF/OWL? Is this the solution that implements cleanly in C/Java/XSLT?
comment:25 in reply to: ↑ 24 ; follow-up: ↓ 26 Changed 12 years ago by caron
Replying to benno:
Replying to caron:
Naively translating your XML into CDL, I would guess:
attributes: :CF\:namespaces = "ows http://www.opengis.net/ows/1.1 owcs http://www.opengis.net/wcs/1.1/ows"; variables: float u(x,y); u:ows\:Abstract = "Northward component of a 2D sea surface velocity vector"; u:owcs\:InterpolationMethods= "nearest";what are the issues with this form?
My point was that it is not a solution until it is translated cleanly into NCML/DAP.ddx/ RDF/OWL. Code as it currently exists would not express this as namespaces/qnames in XML versions of the netcdf information -- it would be buried as attribute elements in NcML/DDX.
Are you committed to cleanly representing this in NcML so that the information can be recognized as XML elements by standard XML tools? Does it cleanly represent in the DDX? Can I translate it cleanly into RDF/OWL? Is this the solution that implements cleanly in C/Java/XSLT?
For CF, all we have is the netcdf-3 classic data model, and I am using CDL to show the way these proposed conventions would look. For now, I think just getting this right is worth doing.
I would guess that special tools can recognize the attributes with the ows/owcs namespace and translate back to your XML. But you would have a better hit on that than me.
Alternatively, theres nothing in this convention to stop you from embedding the XML as a string in the attribute value. Not that pretty, but probably reasonable. Obviously again you need special tools that would understand the XML, which I think would be the same as understanding the namespace.
So I would say that you would need to come up with another convention for mapping between CDL and your XML, presumably building on top of this convention. The hope is that this convention would enable that in a clean way, ie, allow multiple conventions to happen in the same file.
What do you think?
comment:26 in reply to: ↑ 25 ; follow-ups: ↓ 27 ↓ 33 Changed 12 years ago by benno
Replying to caron:
What do you think?
A bit of a split decision. Yes, it is up to me to make sure that the ddx to rdf translation (which is an XSLT transform written by Nathan), can handle the translation, and that the metadata arrives there as clean RDF elements. And XSLT is limited in how it can move things around from namespace to attribute to content, so I am concerned (it certainly was not obvious enough to code in the first two implementations).
But I suspect that there is a strong use case for users wanting to inject blocks of XML-coded metadata into NcML to be taken up into the system. The type="otherXML" example manages to do that, sort of, though there are the loose ends I mentioned earlier (loose ends that the CDL injections have as well, by the way -- we have no way of referencing the Schema a.k.a convention).
Focusing exclusively on CDL means the NcML representation will be awful given the current mapping -- I want to avoid that. If there is a clean NcML representation, users will have an XML option for injecting information, and we can be pretty sure that the information can be moved to the other XML-based formats in standard ways.
comment:27 in reply to: ↑ 26 ; follow-up: ↓ 28 Changed 12 years ago by edavis
Replying to benno:
A bit of a split decision. Yes, it is up to me to make sure that the ddx to rdf translation (which is an XSLT transform written by Nathan), can handle the translation, and that the metadata arrives there as clean RDF elements. And XSLT is limited in how it can move things around from namespace to attribute to content, so I am concerned (it certainly was not obvious enough to code in the first two implementations).
But I suspect that there is a strong use case for users wanting to inject blocks of XML-coded metadata into NcML to be taken up into the system. The type="otherXML" example manages to do that, sort of, though there are the loose ends I mentioned earlier (loose ends that the CDL injections have as well, by the way -- we have no way of referencing the Schema a.k.a convention).
Focusing exclusively on CDL means the NcML representation will be awful given the current mapping -- I want to avoid that. If there is a clean NcML representation, users will have an XML option for injecting information, and we can be pretty sure that the information can be moved to the other XML-based formats in standard ways.
In my summary above, I saw this ticket as dealing with attribute encodings only and left the mapping from non-attribute conventions up to the external convention writers. Since the ability to include external attribute conventions is such an important use case, I would suggest we split the "Capturing XML and RDF/OWL Semantics and Structure in CF" use case into another ticket.
This would allow the external attribute convention use case to move forward quickly. However, I don't want to slow down the "XML and RDF/OWL" discussion either and splitting it into a new ticket may take some of the energy out of that discussion.
Thoughts?
Ethan
comment:28 in reply to: ↑ 27 ; follow-up: ↓ 29 Changed 12 years ago by benno
Replying to edavis:
In my summary above, I saw this ticket as dealing with attribute encodings only and left the mapping from non-attribute conventions up to the external convention writers.
This ticket has always been about representing non-attribute conventions in netcdf -- the initial example was WCS. OGC and ISO are not going to define netcdf attribute representations of their conventions.
comment:29 in reply to: ↑ 28 ; follow-up: ↓ 32 Changed 12 years ago by edavis
Replying to benno:
Replying to edavis:
In my summary above, I saw this ticket as dealing with attribute encodings only and left the mapping from non-attribute conventions up to the external convention writers.
This ticket has always been about representing non-attribute conventions in netcdf -- the initial example was WCS. OGC and ISO are not going to define netcdf attribute representations of their conventions.
Sorry, I should have said "an external convention writer" rather than "the ...". You are right, I don't see OGC or ISO (or many other standards groups) developing mappings to CF. But someone needs to write the mappings.
Does it then seem like a bad idea to split this ticket into two: 1) a CF attribute namespace ticket; and 2) an "XML and RDF/OWL" ticket?
Could 1 be written to support 2? Or is there too much uncertainty about the connections between the two to safely split them at this time?
Ethan
comment:30 follow-up: ↓ 31 Changed 12 years ago by edavis
Just to muddy the water a bit more ...
I believe this ticket was originally branched from the common concept ticket. I'm a bit fuzzy on the connection. It seems that the common concept ticket might use the results of this ticket but there isn't as much dependency in the other direction. Does that sound correct?
Are there issues from the common concept ticket that should come into play when considering this ticket?
Thanks,
Ethan
comment:31 in reply to: ↑ 30 Changed 12 years ago by bnl
Replying to edavis:
I believe this ticket was originally branched from the common concept ticket. I'm a bit fuzzy on the connection. It seems that the common concept ticket might use the results of this ticket but there isn't as much dependency in the other direction. Does that sound correct?
I think that's correct.
Are there issues from the common concept ticket that should come into play when considering this ticket?
The key link is that common concept needs a way of encoding urns and/or scoped names. there is no reverse dependency that I'm aware of.
comment:32 in reply to: ↑ 29 ; follow-up: ↓ 36 Changed 12 years ago by bnl
Replying to edavis:
Does it then seem like a bad idea to split this ticket into two: 1) a CF attribute namespace ticket; and 2) an "XML and RDF/OWL" ticket?
Could 1 be written to support 2? Or is there too much uncertainty about the connections between the two to safely split them at this time?
I think they can be split.
It seems that in the first case we want to mark a "netcdf-acious" attribute with a namespace ... in the other we want to embed some semantics which go beyond a key value pair in a particular namespace. In the latter case we then want to think through the issues of exposing the information in ncml etc ... independently of what seems like a much simpler use case in the former.
Bryan
comment:33 in reply to: ↑ 26 ; follow-up: ↓ 34 Changed 12 years ago by caron
Replying to benno:
Replying to caron:
What do you think?
A bit of a split decision. Yes, it is up to me to make sure that the ddx to rdf translation (which is an XSLT transform written by Nathan), can handle the translation, and that the metadata arrives there as clean RDF elements. And XSLT is limited in how it can move things around from namespace to attribute to content, so I am concerned (it certainly was not obvious enough to code in the first two implementations).
But I suspect that there is a strong use case for users wanting to inject blocks of XML-coded metadata into NcML to be taken up into the system. The type="otherXML" example manages to do that, sort of, though there are the loose ends I mentioned earlier (loose ends that the CDL injections have as well, by the way -- we have no way of referencing the Schema a.k.a convention).
Focusing exclusively on CDL means the NcML representation will be awful given the current mapping -- I want to avoid that. If there is a clean NcML representation, users will have an XML option for injecting information, and we can be pretty sure that the information can be moved to the other XML-based formats in standard ways.
Im hearing that we have an opportunity to create a standard way to add more complex semantics like RDF to the data model. This sounds worth pursuing and I agree with Bryan we should split this into a new ticket.
BTW, I personally avoid XSLT (except when I dont), in favor of working directly with an object representation of the information content. This means I dont have artificial restrictions on what I can do, due to difficulties with XSLT.
comment:34 in reply to: ↑ 33 ; follow-up: ↓ 35 Changed 12 years ago by benno
Replying to caron:
BTW, I personally avoid XSLT (except when I dont), in favor of working directly with an object representation of the information content. This means I dont have artificial restrictions on what I can do, due to difficulties with XSLT.
From a programmer point-of-view, I can understand that. For a writing-a-standard point-of-view, it would be best to get the information representation correct so that one does not have to bury the translational semantics in some custom code in a general-purpose language. And now we have an opportunity to get the representation correct.
comment:35 in reply to: ↑ 34 Changed 12 years ago by caron
Replying to benno:
Replying to caron:
BTW, I personally avoid XSLT (except when I dont), in favor of working directly with an object representation of the information content. This means I dont have artificial restrictions on what I can do, due to difficulties with XSLT.
From a programmer point-of-view, I can understand that. For a writing-a-standard point-of-view, it would be best to get the information representation correct so that one does not have to bury the translational semantics in some custom code in a general-purpose language. And now we have an opportunity to get the representation correct.
I agree on getting the representation correct, but there are a lot of design forces on this. My own opinion is that "usable with XSLT" is in the "nice but not mandatory" list.
comment:36 in reply to: ↑ 32 Changed 12 years ago by jonathan
Replying to bnl:
Replying to edavis:
Does it then seem like a bad idea to split this ticket into two: 1) a CF attribute namespace ticket; and 2) an "XML and RDF/OWL" ticket?
I too agree they can be split. Also, this ticket is distinct from common concepts, in my understanding.
I'd like to raise another small point about the netCDF syntax. This was discussed before, but I'd like to question whether it's wise to use ":" as a separator in attribute names. John explained that this is a legal netCDF character but has to be escaped in CDL, and I can imagine this tripping some people up. Moreover, ":" has a particular significance in CDL anyway, in identifying attributes, and it's possible than an attribute name which itself contains ":" could look confusing to a reader, even though strictly there is no ambiguity.
Perhaps it would be safer to use a less exotic character e.g. the more conventional underscore, and hence a prefix like "CF_"?
Cheers
Jonathan
I like the idea of namespaces, thanks for bring this out. Defining them in the Conventions attribute is ok, although a separate global attribute for them might also be a good idea, to keep older programs from barfing which don't look through comma separated tokens.
I don't like using the ":" char which needs to be escaped in CDL. Why not use another char that does not need escaping? For example, I think "@" is allowed in all versions of netcdf and hardly ever used: