Opened 9 years ago
Last modified 9 years ago
#90 new enhancement
Collection of CF enhancements for interoperable applications
Reported by: | mgschultz | Owned by: | cf-conventions@… |
---|---|---|---|
Priority: | medium | Milestone: | |
Component: | cf-conventions | Version: | |
Keywords: | Cc: |
Description
Dear all,
in an offline discussion with Jonathan and a few others, it became apparent that there may be a few limitations in the current CF convention which make life difficult for building true interoperable services. One of the major obstacles at present is the optional character of attributes, in particular for coordinate axes. If an application cannot rely on finding some specific information about a coordinate, it becomes virtualyl impossible to interpret this information correctly without human intervention (or one has to start guessing which will almost certainly fail at some point). According to my understanding, making attributes mandatory could break the backward compatibility principle that CF has hitherto held very high. Therefore, it appears appropriate to begin a collection of issues about the current CF convention which impediment the implementation of interoperability, and which would not be backward-compatible. Depending on the outcome of this discussion (for which we should allow some time), one could decide to either start a new CF major version number, or adopt all incompatible changes in one new step release version. It should be avoided to introduce incompatible changes in several individual releases.
Here I begin with three items. Responses should indicate whether they agree that these changes cause incompatibilities. If you want to add other issues, please prepend an item number. If discussions on individual issues get lengthy, we should open separate track tickets for them and feed the conclusion back into this ticket.
1) make axis attribute and standard_name attribute mandatory for coordinate variables
2) introduce a "group" level in order to be compatbile with the netcdf4 and hdf data model: each group can have its own coordinate system, but all variables within a group must share the same coordinates
3) standardize use of comments where these are necessary to uniquely identify what a standard_name means or what a variable contains. Examples are the lumping of NMVOC compounds, where the definition of the lumped group should be provided, or the newly proposed emission sectors
Change History (3)
comment:1 Changed 9 years ago by mcginnis
comment:2 Changed 9 years ago by jonathan
Dear Martin
As when we discussed it on the phone, I agree that some things would be easier if certain attributes could be relied on. A lot of CF is optional because we didn't want to put people off using it, especially when it was new, by imposing too much; instead, we wanted to facilitate good practice. Even now, we should not impose more than necessary, since in practice people will just not bother to follow conventions they think unnecessary. What I would suggest, as before, is that we consider defining a set of additional requirements, that could be adopted. For instance, we could define Conventions="CF-1.6/strict", following the syntax of http://www.unidata.ucar.edu/software/netcdf/conventions.html. The strict profile, if adopted, would be part of the netCDF conventions, and its additional requirements would be identified in the standard document. Software could be written that worked only with strictly written data (as indicated by the Conventions attribute). Eventually, if the strict profile was universally used for new data, it could become the only CF standard at some future release. Thus we could evolve towards the situation you advocate.
As normal with proposals for change to CF, I think we should consider only those extra requirements for which there is presently a use-case i.e. a common situation which would be helped by making this change. Also, I suppose we should restrict this to proposals which are refinements of the existing conventions. Completely new facilities can appear in other tickets, and since they are completely new they can be more demanding from the outset, without any problem of backward compatibility.
1) I agree with Seth's point. I would support mandatory standard_name for spatiotemporal coordinate variables. I am not sure that axis is useful for this purpose, since it's been decided (in ticket 62) to allow this attribute on multidimensional auxiliary coordinate variables too. Possibly this relaxation could be reversed by the strict profile!
2) What is the use-case for this? There are, on the contrary, common situations where variables that belong together are on different grids, if an Arakawa B or C grid is being used, for example. I would suggest this is a new facility, which might better be the subject of a different ticket. It is also related to ticket 79.
3) Is it preferable to standardise a comment or to introduce new attributes for specific situations?
4) The positive attribute should be required for any vertical coordinate variable, even if it is pressure.
5) The units attribute should not be allowed to take the values level, layer or sigma_level, supported by section 4.3.2 for backwards compatibility with COARDS, but always deprecated.
6) We could consider defining a cf_role for all CF variables in the file. This should be the subject of a separate ticket first, but it could be mandatory in the strict profile. It is currently quite awkward to work out the purpose of each variable in a CF-netCDF file. In particular, it is awkward to identify the data variables. They are all those which aren't coordinates, cell measures, formula terms, etc.! It would be more convenient if they identified themselves with something like cf_role="data".
Cheers
Jonathan
comment:3 Changed 9 years ago by mgschultz
One small divergence on your last point, Jonathan:
6) We could consider defining a cf_role for all CF variables in the file. This should be the subject of a separate ticket first, but it could be mandatory in the strict profile. It is currently quite awkward to work out the purpose of each variable in a CF-netCDF file. In particular, it is awkward to identify the data variables. They are all those which aren't coordinates, cell measures, formula terms, etc.! It would be more convenient if they identified themselves with something like cf_role="data".
I would probably prefer to assign a cf_role to those variables that are *not* data variables, thereby making "data" the default (but one should encourage people to add the cf_role attribute also to data varables).
But as you say: this should become a new ticket...
Cheers,
Martin
1) A mandatory axis and standard_name attribute will only work for spatio-temporal coordinate variables. I think it's fine to require axis and/or standard_name in the case where a coordinate corresponds to an X/Y/Z/T dimension, but we need to allow for dimensions used to represent things like model ensemble members, probability levels, regions, categories, and so on.
Cheers,
Seth