[CF-metadata] CF and multi-forecast system ensemble data from Kettleborough, Jamie on 2006-11-02 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Kettleborough, Jamie <jamie.kettleborough>
Date: Thu, 02 Nov 2006 16:19:57 +0000

Hello Steve,

I appreciate your concerns on stability and the required complexity for
CF interpreting code - but I think there is a danger in this case that
if we don't accommodate ensembles in the right way then we are limiting
CF's applicability to forecasts (operational or climate).

Yes I agree ensemble is not a coordinate in the same sense as time or
space - there is no obvious unique label or metric. But I think it
makes a lot of sense to treat it as a coordinate/dimension - it
facilitates using the data to make probabilistic forecasts.

Could we live with the grouping approach - literally we could - but I
don't think its the best way forward. I think if you simply want to
compare output from different models then the grouping approach is
probably OK, but I think we are moving beyond that kind of analysis (and
operational forecasting is probably ahead of climate from this point of
view) to an analysis where the _the ensemble is the forecast entity_,
not the individual model.

So I think if you want your analysis tool to be useful in the
forecasting context then there is development effort needed here however
we represent ensembles on disk (and from what Jennifer has said this
work is already underway in GRADS). (Of course there are many
applications where forecast isn't the be all and end all and you can say
your analysis tool is made for this set of applications, not
forecasting). As Bryan pointed out CF is not intrinsically 4D - it is
just it currently makes the space and time dims special. Shouldn't CF-
aware applications be able to deal with arbitrary dimensionality?

I also suspect for large ensembles (100+ members?) the grouping approach
could be inefficient I/O wise. But I haven't had chance to play with
NetCDF4 yet so I could be wrong.

I think the current situation with this is

1) should we use grouping or dimensions for ensembles (I think dimension
is what we currently have)

2) How do we label ensemble meta data?
a) standard_names, or
b) external_dictionary

I think either could work - though external_dictionary seems to give
more scope for coping with different levels of meta data you might
choose to associate with ensembles. And it preserves the general
applicability of the vague names like 'source' as global attributes etc.
(sorry this statement assumes context knowledge of the rest of this
thread). (are there other implications of accepting
external_dictionary?) Single model files would need to include the meta
data coordinates as singleton dimensions - but I think that is OK. I
_think_ this also makes aggregation along realization more like
aggregation along time or level.

3) Is there any common ground for ensembles meta data (across
forecastetc. timescales) that we can standardise? i.e. return to Paco's
list.

4) Would CF act as the standard body for these - or should we be
lobbying for someone else to do it?

(Have I missed anything in my eagerness to try and keep this thread
moving?)

Jamie

On Wed, 2006-11-01 at 09:26 -0800, Steve Hankin wrote:
>
>
> Bryan Lawrence wrote:
> > On Tue, 2006-10-31 at 09:32 -0800, Steve Hankin wrote:
> >
> >
> > > Like others I've watched this email thread grow like Topsy, wondering
> > > if I would find the time to read it ... But the discussion topic seems
> > > big enough to warrant a fair amount of rough and tumble.
> > >
> >
> > Absolutely. Bring it on :-) :-)
> >
> > With regard to the stability of CF and long term evolution, that's
> > what's behind my wanting to separate maintaining descriptions of the
> > quantities measured/predicted from the descriptions of how/why it was
> > done. It's also why I want just one way of doing it, not one that is
> > specially optimized for numerical models per se, and not of wider
> > applicability.
> >
> >
> >
> > > OK. Enuf preamble. In a nutshell, the new proposed structures seem
> > > to capture the semantics of the various collections of model outputs
> > > -- ensembles and forecast collections -- through the addition of new
> > > dimensions. In the most extreme case this dimension list might become
> > > (realization,forecast_reference_time,forecast_period,lev,lat,lon).
> > >
> > > I'd pose two questions:
> > > 1. Will this approach break existing CF applications? If yes, is
> > > that a red flag to consider other options?
> > >
> >
> > I don't believe there is any restriction on number of dimensions in CF,
> > so while it may not be pretty, it seems ok to me.
> >
> > > 1. Is this the same approach that we would take if we already had
> > > netCDF 4? If no, is that a red flag that we should give more
> > > thought to the long-term stability of the standard?
> > >
> >
> > Let's not think about netCDF4 immediately :-) One of the other things we
> > discussed was trying to divorce the content standard from the
> > implementation standard ...
> >
> >
> Bryan,
>
> Let me start by saying with full sincerity that I am not arguing for
> any particular conclusion. I see major trade-offs to both options --
> adding new dimensions and juggling external metadata. But ...
>
> With very little discussion (just above -- granting there are elements
> added below) you have waved away the type of questions that I'd argue
> we need to ask ourselves continuously. Standards are always about
> making compromises. It is right to begin with the approach you are
> insisting on -- divorce the abstract understanding of the problem from
> the messy technology. But the next step has to be to ask what the
> undesirable impacts of your choices might be. And then to make
> changes to your thinking and accept compromises. Without stability CF
> is not a standard at all.
>
> Regarding no. 1 -- the choice to utilize a 5 or 6 D encoding -- the
> consequence will be the the great majority of existing applications
> will be unable to read the new files at all without significant
> modifications. Inside of those new files will be 4D subsets that
> represent the current "CF nut" -- objects that the applications can
> currently read. So we have a large net loss of interoperability until
> additional investments are made in software across our community.
>
> Regarding no. 2 -- the CF community is not more than 2 years from
> where discussions of netCDF 4 will occupy a significant part of our
> time and energy (off the cuff estimate). So to propose concrete
> changes today without fully considering how you would handle them in
> netCDF 4, you are inviting two major sets of changes in as many years.
> That's a very low standard of stability.
>
> So what are the compromises to be looked at? How would your data
> modeling of ensembles and forecasts look different if you think in
> terms of dimensions (an array of identical objects) versus
> "groups" (unordered lists which may have heterogeneous contents) ?
> The former is a more perfect fit to the concepts -- the latter is a
> more general structure that fully embraces the needed concepts.
>
> The question is how great are the negative impacts in stability and
> interoperability that you are willing to accept in order to work with
> the more perfect data model? This is a balancing question where the
> "abstract data modeler" view represents one of the polarized
> positions. The entire substance of your arguments so far seems to
> come from this viewpoint. We acknowledge that CF discussions have not
> found a balance between provider and user viewpoints in the past.
> How do we improve upon that?
>
> - Steve
>
> > But your substantive point is fair enough, is there a cleaner way to
> > think about this ...?
> >
> > Particularly from the point of view that a set of simulations comprising
> > a forecast IS THE forecast (singular), then I think at least
> > (realization,t,z,y,x) is inescapable. I don't like the other example
> > (realization,ref_time,period,z,y,x), but accept that it may well be
> > the natural output of a set of aggregations. Is it not what Thredds
> > would deliver anyway from an aggregation? Which brings me to my last
> > point:
> >
> > As far as Thredds as a stop-gap solution goes. I think we need to
> > divorce how we interact with the data from how we manage it
> > (particularly for posterity). There is no way that I'm going to rely on
> > ANY *interface* to preserve information content, so what you're really
> > saying is that you want to rely on metadata held externally from the
> > files (whether Thredds or not). To some extent that's inescapable (which
> > was one of my points in an earlier email), but we want to stick to the
> > requirement that CF can differentiate data (if not fully describe all
> > the ancillary information) ... and for forecast ensembles (and station
> > data) there is a need for some extra information over and above the
> > index value in a "special" dimension. It's that information we need to
> > get into the CF content standard.
> >
> > Also, from an operational perspective, ok, so maybe I can use Thredds or
> > any interface to get some data, but then I've got it. What then? The
> > content standard has to tell me what's in it, so we're back to CF and
> > possibly pointers to external metadata.
> >
> > (Wrt netcdf4: I see groups as being more useful for aggregating things
> > that don't share common dimensionality)
> >
> > Cheers
> > Bryan
> >
> >
>
> --
> --
>
> Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
> 7600 Sand Point Way NE, Seattle, WA 98115-0070
> ph. (206) 526-6080, FAX (206) 526-6744
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Thu Nov 02 2006 - 09:19:57 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST