⇐ ⇒

[CF-metadata] Towards recognizing and exploiting hierarchical groups

From: Signell, Richard <rsignell>
Date: Mon, 16 Sep 2013 13:52:50 -0400

Charlie & Co,

Also, regardless of whether these hierarchical structures are stored
in NetCDF4 or flattened NetCDF3, we get a big boost in
interoperability when we write datasets with known featureTypes
(profile, time series collection, swath, etc), because then workflows
that have performed a
catalog search and returned dataset endpoints knows what to do. If
the dataset endpoints contain ad hoc heirarchies it will be a lot more
difficult.

So if we create hierarchical datasets, I hope we create known
featureTypes to accompany them (like gridEnsembleStructure).

Thanks,
Rich

On Mon, Sep 16, 2013 at 12:53 PM, Steve Hankin <steven.c.hankin at noaa.gov> wrote:
> Hi Charlie,
>
> Great that you have opened the door onto this discussion topic. Total
> agreement from my pov that "group-awareness" in CF is an area that is crying
> to be explored and solved. Your analysis of technical details -- e.g.
> attribute scope and inheritance by group descendents, etc. -- sounds natural
> and sensible.
>
> The principle barrier to moving forward along this path lies in the fact
> that CF is heavily committed to interoperability. Arguably interoperability
> is the raison d'?tre of CF. The style of "backwards compatibility" that
> one gets through a headlong switch from the netCDF3 API into the
> group-oriented elements of the netCDF4 API is the most extreme sort of 1-way
> trap door. It leads to next generation files that are utterly inaccessible
> to previous generation applications. This style of advancement, which
> heavily degrades interoperability from a community-wide perspective, should
> only be undertaken IMHO if all reasonable alternatives are exhausted. I
> welcome discussion on this "philosophical" point.
>
> So are there reasonable alternatives to these negative impacts on
> interoperabilty? It is common practice to flatten groups by dot-appending
> the name hierarchy: group.subgroup.child. One could certainly envision
> utilities (in the style of your nco) that could convert a netCDF4-CF file
> into a netCDf-3 CF file. Could such a translation layer made available as a
> Web service? Not the worst answer .... A question I'd like to see
> discussed (primarily to Unidata, I guess): how difficult would it be to make
> accommodations in the netCDF API, itself, so that netCDF4 groups were
> accessible through the netCDF-3 API. If such enhancements could be baked
> into the netCDF code the character of the interoperability impacts through
> adding group-aware elements to CF would be utterly changed. This would open
> the door wide to group-aware CF. Has an analysis of this been done?
>
> - Steve
>
> ==============================================
>
>
> On 9/15/2013 6:53 PM, Charlie Zender wrote:
>>
>> NASA has recently convened an Earth Science Data System Working Group to
>> explore existing conventions for data and products stored in HDF and to
>> make recommendations for future developments. The CF Conventions are an
>> important element in this work, as many scientists and users are
>> interested in data products that comply with CF. Many members of the
>> working group are familiar with CF and have been involved in attempts to
>> apply the CF Conventions to a variety of Earth Science data products.
>>
>> We have identified a persistent barrier to NASA's greater adoption of
>> CF: the lack of protocols for exploiting software-defined group
>> hierarchies for data structures. HDF datasets traditionally collected
>> and stewarded by NASA often utilize hierarchical (the "H" in HDF)
>> groups. A chief advantage of netCDF4 over netCDF3 is that it supports a
>> group API compatible with HDF. Here we outline an approach to
>> incorporating groups into CF as a step towards recognizing and,
>> eventually, exploiting groups.
>>
>> Some aspects of CF (especially the netCDF Conventions like _FillValue,
>> valid_min) can apply unambiguously to HDF files that use groups, but
>> other aspects of CF conventions have room for ambiguity when applied to
>> such HDF files. Clarifying that ambiguity is one role of conventions, so
>> we would like to start a discussion with the aim of obtaining feedback,
>> gathering consensus, and eventually, possibly, embedding
>> "group-awareness" into CF. Unidata's white paper on Conventions for
>> netCDF4
>> (http://www.unidata.ucar.edu/software/netcdf/papers/nc4_conventions.html)
>> began
>> the discussion of potential "group-aware" CF capabilities. Some previous
>> discussion of "group-aware" CF metadata is contained or referenced in
>> CF-Metadata Trac tickets 79 (Handling and formatting of vector
>> quantities in CF) and 90 (Collection of CF enhancements for
>> interoperable applications) yet the "big discussion" on how/whether CF
>> should exploit the hierarchical group capabilities of netCDF4 is
>> unfinished. Below we propose a standard scheme for interpreting metadata
>> scope in hierarchical (group) files, and suggest one or two new Group
>> Attributes which we could turn into concrete proposals if interest
>> warrants.
>>
>> Perhaps the most obvious place to start a discussion on making CF
>> "group-aware" is the notion of attribute scope: How ought metadata in
>> one group apply, if at all, to other groups? CF metadata attributes may
>> be applied at the group level (netCDF4 allows this) yet what should that
>> mean? Whereas the current CF Convention speaks only of Global Attributes
>> and Variable Attributes, a "group-aware" CF must explicitly define the
>> properties of a third category of attributes, Group Attributes. Global
>> Attributes are a special case of Group Attributes and should share their
>> properties.
>>
>> The key technical definition we propose is that Group Attributes shall
>> apply to the group where they are defined and to its descendents, but
>> not to that group's ancestors or siblings. Group Attributes apply to all
>> a group's descendents recursively with an exception: Any group may
>> redefine an attribute defined in an ancestor group, and that
>> child-group's definition applies to all its descendents. Thus in cases
>> where multiple ancestor groups define the same attribute, attribute
>> values are inherited from the nearest ancestor. Note that these are the
>> same scoping properties as netCDF4 dimensions.
>>
>> Our understanding is that this proposal is consistent and
>> backwards-compatible with CF. However, it would extend the current usage
>> of CF to files with arbitrary hierarchies of groups. Moreover, it might
>> be helpful to specifically disallow (or mark as having undefined
>> consequences) the use of Group Attributes to store metadata that should
>> always be attached directly to variables. Group Attributes such as
>> _FillValue, scale_factor, valid_min, might sometimes seem tempting yet
>> might create more problems than they would solve. Some attributes (e.g.,
>> Convention) may be useful only as Global Attributes, and not as Group
>> Attributes for other groups.
>>
>> What would a "group-aware" CF Convention mean in practice? It is
>> important to preserve CF backwards compatibility. The metadata
>> annotation of flat files (e.g., all netCDF3 files) need not be affected
>> by any "group-aware" CF Convention extensions.
>>
>> Files with group hierarchies would continue to have Global Attributes
>> (i.e., Group Attributes at the root group level). Global Attributes are
>> almost always useful because they apply to the entire file except where
>> superceded by an attribute of the same name at a lower level. Where
>> group-oriented attribute conventions would help, we believe, is in
>> extending the power of CF unambiguously to nested groups.
>>
>> Imagine a group file in which each top level group holds model results
>> from a distinct CMIP5 simulation (CCSM, ECMWF, GISS, etc.). Or where
>> each top level group holds a different satellite-retrieved value of the
>> same field (ERBE OLR, CERES OLR, etc.), or a different channel from the
>> same multi-spectral radiometer. It may be helpful to know the relation
>> of groups to other groups, so that users and tools can learn which are
>> (or aren't) intercomparable or aggregable. Properties of ensembles
>> stored as groups that would be helpful to know, in an automated way, by
>> analysis tools (such as NCO) include: Which groups contain the other
>> ensemble realizations? Which groups hold other channels of a
>> multi-spectral instrument? Knowing this information would help users and
>> analysis tools infer how best to create ensemble statistics, and could
>> significantly reduce the overall number of files confronting users.
>>
>> Finally, groups allow containerization of information which can be
>> useful in avoiding repetition. Some would like to define metadata-only
>> groups that could then be logically attached to apply to some or all
>> other groups in a file. Is it desirable for CF to define a standard way
>> to indicate this?
>>
>> As the previous examples illustrate, there are at least two levels to a
>> discussion about "group-aware" CF. The first is scope, i.e., how
>> attribute meanings are inherited in hierarchies. The second is the more
>> pragmatic issue of what new CF attributes would allow us to exploit
>> group hierarchies in a systematic way. We proposed an answer to the
>> scope issue to kickstart the discussion. We illustrated how a new
>> attribute (call it "ensemble" for now) might be useful. At this stage we
>> wish to learn whether CF users/developers are interested in pursuing
>> "group-aware" CF extensions at all before we develop more
>> details/wording for specific conventions. Perhaps there are others
>> working on similar issues, or perhaps the CF maintainers prefer to
>> receive specific wording of proposals rather than more diffuse
>> "invitations to discuss" like this. If you have an opinion, then please
>> let us know.
>>
>> Until the CF (or some other) Convention tackles the issues of scoping
>> and Group Attributes, such annotations will be ad hoc. Our goal is to
>> increase interoperability, and we are eager to hear responses from the
>> CF community on the direction of "group-aware" extensions to CF.
>>
>> On behalf of the NASA ESDS HDF5 WG,
>> Charlie Zender, Ted Habermann, and Peter Leonard
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata



-- 
Dr. Richard P. Signell   (508) 457-2229
USGS, 384 Woods Hole Rd.
Woods Hole, MA 02543-1598
Received on Mon Sep 16 2013 - 11:52:50 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒