[CF-metadata] CF and multi-forecast system ensemble data from Kettleborough, Jamie on 2006-10-30 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Kettleborough, Jamie <jamie.kettleborough>
Date: Mon, 30 Oct 2006 14:41:32 +0000

Thanks Bryan,

I'm not sure I've fully got my head round what you are suggesting. As
I'm sure you're aware (even if it means dredging through the mega post -
sorry again) the whole issue of aggregation was not really the point of
Paco's original post. It was something I introduced to try and
understand how CMOR/IPCC AR4 data might be bought into the same ensemble
framework. I think this has to be thought about in this context though
- as ensembles are, by their nature, built up from individual model
runs.

Are you suggesting 'external_dictionary' attributes should only be
applied to pseudo-coordinates like ensemble/realization?

Under your suggestion could you encode the IPCC AR4 data something like

float temperature(time,realization,lat,lon):
   temperature:coordinates = 'experiment_id source institution' ;

char experiment_id(realization,len100):
   experiment_id:external_vocabulary = 'http://www-
pcmdi.llnl.gov/ipcc/about_ipcc.php' ; // may not be best URI... but..

char source(realization,len100):
   source:external_vocabulary = 'http://www-
pcmdi.llnl.gov/ipcc/about_ipcc.php'

char institution(realization,len100):
   source:external_vocabulary = 'http://www-
pcmdi.llnl.gov/ipcc/about_ipcc.php'

experiment_id=['720 ppm stabilization experiment (SRES A1B)',
                '550 ppm stabilization experiment (SRES A1B)', ... ]

source=['HadGEM1 (2004): atmosphere: N96L38; ocean: (1.0-0.3 x 1.0);sea
ice: multiple thickness categories, EVP ; land: MOSES2',
        'HadGEM1 (2004): atmosphere: N96L38; ocean: (1.0-0.3 x 1.0);sea
ice: multiple thickness categories, EVP ; land: MOSES2', .... ]

institution=[etc...]

(I'm trying to cross check I understand what you are suggesting - sorry
if I'm way off track).

And Paco could do something similar for ENSEMBLES? Or of course we
could agree the common ground for ENSEMBLES, IPCC, (and/or MO-QUMP,
cp.net) and publish a new dictionary or sets of dictionaries? (I guess
the GRIB2 support for ensembles gives some indication of the relevant
meta data WMO would expect to see?)

For aggregation you would not try and capture any of the information in
the global attributes - you would only, in effect, allow aggregation
along pre-existing dimensions (even if they are scalar dimensions).

Thanks,

Jamie



On Fri, 2006-10-27 at 10:22 +0100, Bryan Lawrence wrote:
> Hi Folks
>
> It seems that there are two threads to the massive email from
> Jamie,Paco, Jonathan that we've seen (and which is not included
> below :-).
>
> The first is how to expedite aggregation, which is the point John has
> picked up on, and the second is how to deal with the special case of
> ensemble metadata. In this email I'm going to try and boil down what I
> think the issues are.
>
> The suggestion proposed, as far as I can see reduces to:
>
> a) we should create some standard names which exactly correspond to
> some recommended global variables, and model integrations should use
> global variables where they contain exactly one realization to indicate
> these.
>
> b) upon aggregation, one should use the realization dimension (called
> joinDim by John and realization by the triumvirate) as both a dimension
> on the aggregated variable and a key to the metadata about the
> realizations which is extracted from global variables and populates some
> new variables which are essentially metadata variables. (This automatic
> extraction would be based on a correspondence between standard names and
> global attribute names)
>
> At this point if we take the word model from point a) and the constraint
> on realization, then what we have is a rule for how to construct global
> metadata into aggregated netcdf files.
>
> Note that CF doesn't *require* any global attributes (2.6.2), so no code
> can rely on them being present, nor does it *limit* global attributes to
> only those defined. (Which is good, because I think those defined are
> not precise enough for me, nor for Paco, judging by the discussion).
>
> I would have thought it makes sense that in producing any aggregation,
> that any of the original metadata which distinguishes the original files
> (and their contents) should appear in metadata associated with the join
> dimension. From a coding point of view the problem will arise in the
> join when the individual files don't all have the appropriate global
> attributes (trust me, it'll happen). Anyway from a CF point of view this
> produces CF issue A: What is the best method of providing metadata on an
> aggregation dimension and indicating that's what has been done?
>
> (Issue B: might be to make sure our solution works when we aggregate two
> files that have already been aggregated).
>
> I'm sure John has thought more than most (especially me) about this, so
> I'll shut up about aggregation for now ...
>
> At this point there is no necessity for any modifications of standard
> names, the discussion thus far is simply about how to use what
> attributes we have, but:
>
> Issue C: should we have a standard name modifier associated with a
> variable on the aggregation dimension to show that it was produced by
> aggregating file attributes?
>
> They also got into what I would call the specifics of variable metadata
> needed to distinguish ensemble member identity and characteristics.
> These may or may not need standard names, but let's start with the
> principle. Are these not simply special cases of variable metadata? With
> the possible exception of ensemble_weight, all the others are intended
> to be either character strings or url links to information about
> variable metadata. In principle we could want exactly the same
> facilities for any variables (e.g. a bunch of station data in one file,
> we might want to have a lot more characteristics of each station linked
> by the station dimension in exactly the same way as the realization
> dimension could be used).
>
> So, I think what we have here is a discussion about what are the
> variable attributes (metadata) required to distinguish ensemble members
> (or model integrations)? This could go way beyond what we have seen in
> the email. Jamie has referred to our work on NumSim
> (http://proj.badc.rl.ac.uk/ndg/wiki/NumSim), and there is also the work
> on Numerical Model Metadata
> (http://www.cgam.nerc.ac.uk/pmwiki/NMM/index.php/). Similarly, the
> metadata one *might* want to attach to any observation goes way beyond
> what one can mandate in the CF file (e.g. SensorML etc).
>
> So there is a continuum of information one could have about a variable
> (indexed by station or realization or whatever), ranging from simply
> identity (e.g. item number in the sequence) to it's source, to the
> entire content of one these external schema.
>
> CF has a number of ways of actually adding such information, we can use
> labels (6.1), Ancillary Data etc ... and so without going into the
> specifics of the triumvirate discussion, the issue really is:
>
> Issue-D: how far should CF go into providing standard names for
> describing variable metadata?
>
> Quite clearly there is utility in new standard names, but in doing so
> are we not straying into the governance territory of others - (e.g.
> SensorML). I've argued quite strongly that we shouldn't do any more of
> this, because every time we do, we add to the CF maintenance problem
> (e.g. I think we already have problems with gazeteers).
>
> Now I know (because I talked to Jamie on the phone) that triumvirate
> could argue that in the case of model metadata, it would be helpful for
> CF to mandate this stuff at least for the numerical model community.
>
> Now we can do that ... but ...
>
> I didn't want to make a proposal in this email, but, I think we should
> have a standard mechanism of indicating that a specific characterstring
> variable set comes from an external vocabulary indicated by a URI:,
> something like:
>
> variables:
> float temperature(realization,...);
> temperature:coordinates="realization ...";
> temperature:ancillary_variables = "metadata";
> char metadata(realization,len80);
> metadata:external_dictionary='http://someExternalGovernanceBody';
> (i.e. a new modifier).
>
> Then, IPCC, or WMO or whoever, could come up with the dictionaries they
> want, and the cf file would still be self-describing, cf-software could
> still manipulate it, but communities could write software that made use
> of these extra characteristics, without CF having to govern everything
> (and add all the components of these external schema to CF, piece by
> piece, with detailed arguments for each one to go in the standard name
> table).
>
> Obviously my mechanism could be extended for any characteristic of a
> variable that we didn't want to put in a standard name, so the real
> policy issue is Issue-D, and it comes back to the scope of CF, which
> needs some decision.
>
> Thus far we have held a line on standard names, but I feel the
> Paco,Jamie,Jonathan proposal takes us into new territory. We will take
> on having to discuss and approve a whole new class of standard names
> (and all the work that goes with that). It's fine if it's a community
> decision to go there, but we need our eyes open :-), and some realistic
> appreciation of how we can actually do it.
>
> My opinion: better to push it off onto the communities who care, and
> keep CF focused.
>
> Cheers
> Bryan
>
>
>
>
>
>
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Mon Oct 30 2006 - 07:41:32 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST