⇐ ⇒

[CF-metadata] CF and multi-forecast system ensemble data

From: Steve Hankin <Steven.C.Hankin>
Date: Thu, 02 Nov 2006 09:08:04 -0800

Hi Jamie,

I agree with every technical point you have made here. On technical
grounds, alone, I would draw all of the same conclusions. And that
makes mine a hard position to advocate for. "Stability", "broad
interoperability" and "burdens passed along to client developers" are
all relatively vague concepts and difficult to quantify and weigh in
choices where technical advantages speak so clearly.

For the discussion in question, I will drop my objections (unless others
want further discussion). However, we should follow through with the
agreements and plans made at GO-ESSP and in the CF White Paper in the
steps that will follow. Namely, that the new technical content should
be regarded as provisional until it has been well tested both in the
creation of files and in the creation of applications that process those
files. Lessons learned in that process should be given serious weight
in considering alterations while the new technical content remains
provisional.

I think the CF community still faces a challenge to figure out how to
balance its responsibilities as stewards against its excitement over
technical advances. I doubt that our email/Web process will create a
numerical balance of voices in this regard. We need to find ways to
empower the "no" sayers. (It is not a fun position to take to be a "no"
sayer. It is the wet blanket role in the interactions.) Any suggestions?

    - Steve

===========================

Kettleborough, Jamie wrote:
> Hello Steve,
>
> I appreciate your concerns on stability and the required complexity for
> CF interpreting code - but I think there is a danger in this case that
> if we don't accommodate ensembles in the right way then we are limiting
> CF's applicability to forecasts (operational or climate).
>
> Yes I agree ensemble is not a coordinate in the same sense as time or
> space - there is no obvious unique label or metric. But I think it
> makes a lot of sense to treat it as a coordinate/dimension - it
> facilitates using the data to make probabilistic forecasts.
>
> Could we live with the grouping approach - literally we could - but I
> don't think its the best way forward. I think if you simply want to
> compare output from different models then the grouping approach is
> probably OK, but I think we are moving beyond that kind of analysis (and
> operational forecasting is probably ahead of climate from this point of
> view) to an analysis where the _the ensemble is the forecast entity_,
> not the individual model.
>
> So I think if you want your analysis tool to be useful in the
> forecasting context then there is development effort needed here however
> we represent ensembles on disk (and from what Jennifer has said this
> work is already underway in GRADS). (Of course there are many
> applications where forecast isn't the be all and end all and you can say
> your analysis tool is made for this set of applications, not
> forecasting). As Bryan pointed out CF is not intrinsically 4D - it is
> just it currently makes the space and time dims special. Shouldn't CF-
> aware applications be able to deal with arbitrary dimensionality?
>
> I also suspect for large ensembles (100+ members?) the grouping approach
> could be inefficient I/O wise. But I haven't had chance to play with
> NetCDF4 yet so I could be wrong.
>
> I think the current situation with this is
>
> 1) should we use grouping or dimensions for ensembles (I think dimension
> is what we currently have)
>
> 2) How do we label ensemble meta data?
> a) standard_names, or
> b) external_dictionary
>
> I think either could work - though external_dictionary seems to give
> more scope for coping with different levels of meta data you might
> choose to associate with ensembles. And it preserves the general
> applicability of the vague names like 'source' as global attributes etc.
> (sorry this statement assumes context knowledge of the rest of this
> thread). (are there other implications of accepting
> external_dictionary?) Single model files would need to include the meta
> data coordinates as singleton dimensions - but I think that is OK. I
> _think_ this also makes aggregation along realization more like
> aggregation along time or level.
>
> 3) Is there any common ground for ensembles meta data (across
> forecastetc. timescales) that we can standardise? i.e. return to Paco's
> list.
>
> 4) Would CF act as the standard body for these - or should we be
> lobbying for someone else to do it?
>
> (Have I missed anything in my eagerness to try and keep this thread
> moving?)
>
> Jamie
>
> On Wed, 2006-11-01 at 09:26 -0800, Steve Hankin wrote:
>
>> Bryan Lawrence wrote:
>>
>>> On Tue, 2006-10-31 at 09:32 -0800, Steve Hankin wrote:
>>>
>>>
>>>
>>>> Like others I've watched this email thread grow like Topsy, wondering
>>>> if I would find the time to read it ... But the discussion topic seems
>>>> big enough to warrant a fair amount of rough and tumble.
>>>>
>>>>
>>> Absolutely. Bring it on :-) :-)
>>>
>>> With regard to the stability of CF and long term evolution, that's
>>> what's behind my wanting to separate maintaining descriptions of the
>>> quantities measured/predicted from the descriptions of how/why it was
>>> done. It's also why I want just one way of doing it, not one that is
>>> specially optimized for numerical models per se, and not of wider
>>> applicability.
>>>
>>>
>>>
>>>
>>>> OK. Enuf preamble. In a nutshell, the new proposed structures seem
>>>> to capture the semantics of the various collections of model outputs
>>>> -- ensembles and forecast collections -- through the addition of new
>>>> dimensions. In the most extreme case this dimension list might become
>>>> (realization,forecast_reference_time,forecast_period,lev,lat,lon).
>>>>
>>>> I'd pose two questions:
>>>> 1. Will this approach break existing CF applications? If yes, is
>>>> that a red flag to consider other options?
>>>>
>>>>
>>> I don't believe there is any restriction on number of dimensions in CF,
>>> so while it may not be pretty, it seems ok to me.
>>>
>>>
>>>> 1. Is this the same approach that we would take if we already had
>>>> netCDF 4? If no, is that a red flag that we should give more
>>>> thought to the long-term stability of the standard?
>>>>
>>>>
>>> Let's not think about netCDF4 immediately :-) One of the other things we
>>> discussed was trying to divorce the content standard from the
>>> implementation standard ...
>>>
>>>
>>>
>> Bryan,
>>
>> Let me start by saying with full sincerity that I am not arguing for
>> any particular conclusion. I see major trade-offs to both options --
>> adding new dimensions and juggling external metadata. But ...
>>
>> With very little discussion (just above -- granting there are elements
>> added below) you have waved away the type of questions that I'd argue
>> we need to ask ourselves continuously. Standards are always about
>> making compromises. It is right to begin with the approach you are
>> insisting on -- divorce the abstract understanding of the problem from
>> the messy technology. But the next step has to be to ask what the
>> undesirable impacts of your choices might be. And then to make
>> changes to your thinking and accept compromises. Without stability CF
>> is not a standard at all.
>>
>> Regarding no. 1 -- the choice to utilize a 5 or 6 D encoding -- the
>> consequence will be the the great majority of existing applications
>> will be unable to read the new files at all without significant
>> modifications. Inside of those new files will be 4D subsets that
>> represent the current "CF nut" -- objects that the applications can
>> currently read. So we have a large net loss of interoperability until
>> additional investments are made in software across our community.
>>
>> Regarding no. 2 -- the CF community is not more than 2 years from
>> where discussions of netCDF 4 will occupy a significant part of our
>> time and energy (off the cuff estimate). So to propose concrete
>> changes today without fully considering how you would handle them in
>> netCDF 4, you are inviting two major sets of changes in as many years.
>> That's a very low standard of stability.
>>
>> So what are the compromises to be looked at? How would your data
>> modeling of ensembles and forecasts look different if you think in
>> terms of dimensions (an array of identical objects) versus
>> "groups" (unordered lists which may have heterogeneous contents) ?
>> The former is a more perfect fit to the concepts -- the latter is a
>> more general structure that fully embraces the needed concepts.
>>
>> The question is how great are the negative impacts in stability and
>> interoperability that you are willing to accept in order to work with
>> the more perfect data model? This is a balancing question where the
>> "abstract data modeler" view represents one of the polarized
>> positions. The entire substance of your arguments so far seems to
>> come from this viewpoint. We acknowledge that CF discussions have not
>> found a balance between provider and user viewpoints in the past.
>> How do we improve upon that?
>>
>> - Steve
>>
>>
>>> But your substantive point is fair enough, is there a cleaner way to
>>> think about this ...?
>>>
>>> Particularly from the point of view that a set of simulations comprising
>>> a forecast IS THE forecast (singular), then I think at least
>>> (realization,t,z,y,x) is inescapable. I don't like the other example
>>> (realization,ref_time,period,z,y,x), but accept that it may well be
>>> the natural output of a set of aggregations. Is it not what Thredds
>>> would deliver anyway from an aggregation? Which brings me to my last
>>> point:
>>>
>>> As far as Thredds as a stop-gap solution goes. I think we need to
>>> divorce how we interact with the data from how we manage it
>>> (particularly for posterity). There is no way that I'm going to rely on
>>> ANY *interface* to preserve information content, so what you're really
>>> saying is that you want to rely on metadata held externally from the
>>> files (whether Thredds or not). To some extent that's inescapable (which
>>> was one of my points in an earlier email), but we want to stick to the
>>> requirement that CF can differentiate data (if not fully describe all
>>> the ancillary information) ... and for forecast ensembles (and station
>>> data) there is a need for some extra information over and above the
>>> index value in a "special" dimension. It's that information we need to
>>> get into the CF content standard.
>>>
>>> Also, from an operational perspective, ok, so maybe I can use Thredds or
>>> any interface to get some data, but then I've got it. What then? The
>>> content standard has to tell me what's in it, so we're back to CF and
>>> possibly pointers to external metadata.
>>>
>>> (Wrt netcdf4: I see groups as being more useful for aggregating things
>>> that don't share common dimensionality)
>>>
>>> Cheers
>>> Bryan
>>>
>>>
>>>
>> --
>> --
>>
>> Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
>> 7600 Sand Point Way NE, Seattle, WA 98115-0070
>> ph. (206) 526-6080, FAX (206) 526-6744
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>

-- 
--
Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
7600 Sand Point Way NE, Seattle, WA 98115-0070
ph. (206) 526-6080, FAX (206) 526-6744
Received on Thu Nov 02 2006 - 10:08:04 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒