[CF-metadata] some concerns about the "ensemble axis" proposal from Steve Hankin on 2007-03-08 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Steve Hankin <Steven.C.Hankin>
Date: Thu, 08 Mar 2007 10:37:35 -0800

Hi Jennifer et. al.,

Thanks so much for introducing this fresh use case, Jennifer. In your
application -- ensemble outputs from a single model -- there is no need
to be playing with the multi-model elephant. So your use case bypasses
four of the six concerns that were raised in the message that started
this thread. And it simplifies one of the remaining. So for all of
those in these email thread who were nearing exhaustion (and that
includes me) *_this may indicate a consensus_* :-) -- an agreement on
large parts of the ensemble axis proposal, though not necessarily some
specific details that apply to multi-model ensembles. Will wait to see
what concerns Balaji may raise.

As above, two concerns that remain to be addressed are
3. The character of the ensemble axis
and
5. The barriers to interoperability that arise from instability and
snow-balling complexity in the CF spec

Regarding 3, the character of the ensemble axis, Jennifer's use case is
satisfied by using a single string variable (well actually a 2D char
array, but we can assume that added complexity is implicit). Although
this does muddy the waters by introducing a gridded axis concept in
which the notion of monotonic progression have been left behind, it is a
relatively straightforward extension. I'd argue that such an axis
should carry a something like a mandatory "axis" attribute, ('axis =
"ensemble"' or 'axis_type="enumerated"' or whatever), that would make it
unambiguous for both humans and machines to understand the
interpretation of the file. Some of the most fragile aspects of CF
are in the obscure (and growing) set of inferences that applications
must make to determine how the parts hang together.

It was in the discussion of how to extend Jennifer's straightforward
concept (an ensemble axis) into a fusion of institution, source,
experiment_id, etc. that the sense of cramming multi-model concepts into
CF went into high gear. There is a long thread of emails proposing and
debating aspects of this. How about if this specific topic were placed
into the wiki framework -- "How to handle the metadata of a multi-model
ensemble in CF" -- and we see if we can come up with a clean consensus
solution? Maybe part the solution will prove not to belong in the files
themselves, but in some external metadata associating the ensemble members.

Regarding 5 -- the barriers to interoperability from instability: we
have discussions that go back several years on the subject of "CF
conformance levels". These concepts may provide an answer for us here.
We are just starting to achieve new levels of interoperability as
applications like the ESRI ArcGIS products learn how to read (a simple
subset of) CF 4-dimensional files. My concern is that we don't torpedo
that progress by creating a moving target that clients cannot hope to
keep up with. One approach to this is that we regard the ensemble and
forecast axes being proposed in CF through a conformance matrix in the
style that "GKS level 2b" was interpreted. For example, a global
attribute, "conventions_featurelist"
    'conventions = cf-2.0'
    'conventions_featurelist = "6d"
could indicate whether support for the extended axis features was
required by an application in order to interpret this file.

How does this help us? Well the on-the-ground reality of handling
ensemble and forecast datasets is that in most cases they will be
created by aggregation servers. It is a relatively straightforward
matter for an aggregation server to create both the 6d version of the
dataset *and *the collection (e.g. via THREDDS) of corresponding 4d
(probably native model output) datasets. Each view would be identified
by its conventions_featurelist attribute. It is a way to have our cake
and eat it, too -- provide high level functionality for high end
applications and retain intelligibility for low end applications.

    enuf said. (exhaustion factor ...) - Steve

==========================

Jennifer Adams wrote:
> This discussion is getting juicy!
>
> I am the GrADS and GDS developer working on an interface for
> 5-dimensional data sets. Ensembles are one example of how the 5th
> dimension might be used, but there are others (e.g. EOFs), so we are
> trying to make it as general as possible while still being practical
> and usable. GrADS is written in C and handles data in a variety of
> formats. Data file aggregation over time, and now over the "e"
> dimension, is possible but not required.
>
> Currently, we are not building an interface for multi-model ensembles
> on different grids. The elephant in Steve's living room will not
> allowed to play in our yard. Fast and easy interpolation between data
> sets on different grids was omitted from GrADS by design and that is
> not likely to change with the addition of a new grid dimension. If
> users want to lump data sets on different grids together, they must
> handle the interpolation explicitly in a way that is best suited to
> their needs and in a way that they know will best preserve the
> information in the data they wish to extract.
>
> Ensembles that are on the same grid will be handled by GrADS. For
> metadata, we are taking a minimalist approach -- the ensemble axis is
> linear, and members have a unique name (<16 characters) and are
> numbered from 1 to n. We don't require that all members have the same
> start time or length, so those pieces of metadata are also required.
> This information is generally provided in a data descriptor file, an
> external metadata source written by the user after poring over the
> output from ncdump or wgrib or similar routine.
>
> If I am handed a single netcdf file with multi-model-different-grid
> ensembles in it from ECMWF or GFDL, I'm going to write a set of
> descriptor files, each one describing the subset of variables on a
> common 5D grid. I'll have one descriptor file per grid, all pointing
> at the same data file. Now I'm set to do my analysis in GrADS,
> beginning with careful interpolation between the different grids.
>
> When I put my 5D data sets behind a GDS and serve them to the world of
> OPeNDAP clients (including GrADS), it becomes a special case: a 5D
> netcdf file that doesn't require a descriptor file, a file that has
> all the metadata GrADS needs packaged in just the right way. For the
> time being, my approach works because I'm writing the code for the
> client and the server, I'm not worrying about any other client trying
> to read my 5D GDS data set, and I'm not trying to be CF-compliant.
> Here's what it's going to look like:
>
> dimensions:
> lon = 9 ;
> lat = 9 ;
> lev = 9 ;
> time = 9 ;
> ens = 9 ;
> string16 = 16 ;
> variables:
> float lon(lon) ;
> lon:units = "degrees_east" ;
> float lat(lat) ;
> lat:units = "degrees_north" ;
> float lev(lev) ;
> lev:units = "level" ;
> float time(time) ;
> time:units = "days since 0001-01-01 00:00:00" ;
> float ens(ens) ;
> ens:grads_dim = "e" ;
> char ens_name(ens, string16) ;
> ens_name:long_name = "ensemble name" ;
> int ens_length(ens) ;
> ens_length:long_name = "ensemble length" ;
> int ens_tinit(ens) ;
> ens_tinit:long_name = "ensemble initial time index" ;
> float var(ens, time, lev, lat, lon) ;
> var:long_name = "test variable" ;
>
> When more metadata is required to bring my GDS data set into CF
> compliance, or to make it readable by other open source clients, I'll
> add it. As long as GrADS users have the means to keep up with the data
> sets being generated by CFS, IPCC, TIGGE, or whatever, then I'm not
> concerned.
>
> It took me a long time to write out this email -- I lost most of an
> afternoon trying to phrase everything properly. I have been reading
> this thread with interest, but I just can't keep up this kind of
> lengthy correspondence on a regular basis. Please keep me in mind as
> one of the silent listeners who still cares about the outcome.
>
> Jennifer
>
> --
> Jennifer M. Adams
> IGES/COLA
> 4041 Powder Mill Road, Suite 302
> Calverton, MD 20705
> jma at cola.iges.org <mailto:jma at cola.iges.org>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>

-- 
--
Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
7600 Sand Point Way NE, Seattle, WA 98115-0070
ph. (206) 526-6080, FAX (206) 526-6744
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20070308/a3d5adc3/attachment-0002.html>

Received on Thu Mar 08 2007 - 11:37:35 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST