Hi Charlie,
Great that you have opened the door onto this discussion topic. Total
agreement from my pov that "group-awareness" in CF is an area that is
crying to be explored and solved. Your analysis of technical details
-- e.g. attribute scope and inheritance by group descendents, etc. --
sounds natural and sensible.
The principle barrier to moving forward along this path lies in the fact
that CF is heavily committed to interoperability. Arguably
interoperability is the raison d'?tre of CF. The style of "backwards
compatibility" that one gets through a headlong switch from the netCDF3
API into the group-oriented elements of the netCDF4 API is the most
extreme sort of 1-way trap door. It leads to next generation files that
are utterly inaccessible to previous generation applications. This
style of advancement, which heavily degrades interoperability from a
community-wide perspective, should only be undertaken IMHO if all
reasonable alternatives are exhausted. I welcome discussion on this
"philosophical" point.
So are there reasonable alternatives to these negative impacts on
interoperabilty? It is common practice to flatten groups by
dot-appending the name hierarchy: group.subgroup.child. One could
certainly envision utilities (in the style of your nco) that could
convert a netCDF4-CF file into a netCDf-3 CF file. Could such a
translation layer made available as a Web service? Not the worst answer
.... A question I'd like to see discussed (primarily to Unidata, I
guess): how difficult would it be to make accommodations in the netCDF
API, itself, so that netCDF4 groups were accessible through the netCDF-3
API. If such enhancements could be baked into the netCDF code the
character of the interoperability impacts through adding group-aware
elements to CF would be utterly changed. This would open the door wide
to group-aware CF. Has an analysis of this been done?
- Steve
==============================================
On 9/15/2013 6:53 PM, Charlie Zender wrote:
> NASA has recently convened an Earth Science Data System Working Group to
> explore existing conventions for data and products stored in HDF and to
> make recommendations for future developments. The CF Conventions are an
> important element in this work, as many scientists and users are
> interested in data products that comply with CF. Many members of the
> working group are familiar with CF and have been involved in attempts to
> apply the CF Conventions to a variety of Earth Science data products.
>
> We have identified a persistent barrier to NASA's greater adoption of
> CF: the lack of protocols for exploiting software-defined group
> hierarchies for data structures. HDF datasets traditionally collected
> and stewarded by NASA often utilize hierarchical (the "H" in HDF)
> groups. A chief advantage of netCDF4 over netCDF3 is that it supports a
> group API compatible with HDF. Here we outline an approach to
> incorporating groups into CF as a step towards recognizing and,
> eventually, exploiting groups.
>
> Some aspects of CF (especially the netCDF Conventions like _FillValue,
> valid_min) can apply unambiguously to HDF files that use groups, but
> other aspects of CF conventions have room for ambiguity when applied to
> such HDF files. Clarifying that ambiguity is one role of conventions, so
> we would like to start a discussion with the aim of obtaining feedback,
> gathering consensus, and eventually, possibly, embedding
> "group-awareness" into CF. Unidata's white paper on Conventions for
> netCDF4
> (http://www.unidata.ucar.edu/software/netcdf/papers/nc4_conventions.html) began
> the discussion of potential "group-aware" CF capabilities. Some previous
> discussion of "group-aware" CF metadata is contained or referenced in
> CF-Metadata Trac tickets 79 (Handling and formatting of vector
> quantities in CF) and 90 (Collection of CF enhancements for
> interoperable applications) yet the "big discussion" on how/whether CF
> should exploit the hierarchical group capabilities of netCDF4 is
> unfinished. Below we propose a standard scheme for interpreting metadata
> scope in hierarchical (group) files, and suggest one or two new Group
> Attributes which we could turn into concrete proposals if interest warrants.
>
> Perhaps the most obvious place to start a discussion on making CF
> "group-aware" is the notion of attribute scope: How ought metadata in
> one group apply, if at all, to other groups? CF metadata attributes may
> be applied at the group level (netCDF4 allows this) yet what should that
> mean? Whereas the current CF Convention speaks only of Global Attributes
> and Variable Attributes, a "group-aware" CF must explicitly define the
> properties of a third category of attributes, Group Attributes. Global
> Attributes are a special case of Group Attributes and should share their
> properties.
>
> The key technical definition we propose is that Group Attributes shall
> apply to the group where they are defined and to its descendents, but
> not to that group's ancestors or siblings. Group Attributes apply to all
> a group's descendents recursively with an exception: Any group may
> redefine an attribute defined in an ancestor group, and that
> child-group's definition applies to all its descendents. Thus in cases
> where multiple ancestor groups define the same attribute, attribute
> values are inherited from the nearest ancestor. Note that these are the
> same scoping properties as netCDF4 dimensions.
>
> Our understanding is that this proposal is consistent and
> backwards-compatible with CF. However, it would extend the current usage
> of CF to files with arbitrary hierarchies of groups. Moreover, it might
> be helpful to specifically disallow (or mark as having undefined
> consequences) the use of Group Attributes to store metadata that should
> always be attached directly to variables. Group Attributes such as
> _FillValue, scale_factor, valid_min, might sometimes seem tempting yet
> might create more problems than they would solve. Some attributes (e.g.,
> Convention) may be useful only as Global Attributes, and not as Group
> Attributes for other groups.
>
> What would a "group-aware" CF Convention mean in practice? It is
> important to preserve CF backwards compatibility. The metadata
> annotation of flat files (e.g., all netCDF3 files) need not be affected
> by any "group-aware" CF Convention extensions.
>
> Files with group hierarchies would continue to have Global Attributes
> (i.e., Group Attributes at the root group level). Global Attributes are
> almost always useful because they apply to the entire file except where
> superceded by an attribute of the same name at a lower level. Where
> group-oriented attribute conventions would help, we believe, is in
> extending the power of CF unambiguously to nested groups.
>
> Imagine a group file in which each top level group holds model results
> from a distinct CMIP5 simulation (CCSM, ECMWF, GISS, etc.). Or where
> each top level group holds a different satellite-retrieved value of the
> same field (ERBE OLR, CERES OLR, etc.), or a different channel from the
> same multi-spectral radiometer. It may be helpful to know the relation
> of groups to other groups, so that users and tools can learn which are
> (or aren't) intercomparable or aggregable. Properties of ensembles
> stored as groups that would be helpful to know, in an automated way, by
> analysis tools (such as NCO) include: Which groups contain the other
> ensemble realizations? Which groups hold other channels of a
> multi-spectral instrument? Knowing this information would help users and
> analysis tools infer how best to create ensemble statistics, and could
> significantly reduce the overall number of files confronting users.
>
> Finally, groups allow containerization of information which can be
> useful in avoiding repetition. Some would like to define metadata-only
> groups that could then be logically attached to apply to some or all
> other groups in a file. Is it desirable for CF to define a standard way
> to indicate this?
>
> As the previous examples illustrate, there are at least two levels to a
> discussion about "group-aware" CF. The first is scope, i.e., how
> attribute meanings are inherited in hierarchies. The second is the more
> pragmatic issue of what new CF attributes would allow us to exploit
> group hierarchies in a systematic way. We proposed an answer to the
> scope issue to kickstart the discussion. We illustrated how a new
> attribute (call it "ensemble" for now) might be useful. At this stage we
> wish to learn whether CF users/developers are interested in pursuing
> "group-aware" CF extensions at all before we develop more
> details/wording for specific conventions. Perhaps there are others
> working on similar issues, or perhaps the CF maintainers prefer to
> receive specific wording of proposals rather than more diffuse
> "invitations to discuss" like this. If you have an opinion, then please
> let us know.
>
> Until the CF (or some other) Convention tackles the issues of scoping
> and Group Attributes, such annotations will be ad hoc. Our goal is to
> increase interoperability, and we are eager to hear responses from the
> CF community on the direction of "group-aware" extensions to CF.
>
> On behalf of the NASA ESDS HDF5 WG,
> Charlie Zender, Ted Habermann, and Peter Leonard
Received on Mon Sep 16 2013 - 10:53:38 BST