[CF-metadata] CF and multi-forecast system ensemble data from John Caron on 2006-10-27 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: John Caron <caron>
Date: Thu, 26 Oct 2006 18:17:32 -0600

Hello all:

Im trying to understand this conversation with respect to the aggregation capabilities we are building into the THREDDS Data Server, so I have a few specific questions that may be a bit of a side conversation:

Specifically, are we talking about how a series of files should be written, or one big file? Assuming the former, is there a preferred way to write a single file? All output from one (ensemble) run per file? A single hour per file? The same hour for all ensembles in one file?

In the aggregation context, we tend to like to "join on the outer dimension", that is if you had variables like

  float temp(time,lat,lon)

in each file, and then joined over multiple files, you would get

  float temp(joinDim,time,lat,lon)

where the joinDim would be ensembles or whatever.

We could add a feature where you could mark global attributes to be made into variables, so that a global attribute like

      : source = 'model 2 output'

would become

  char source(joinDim, strlen);

which could be useful for coordinate variables and other information.

We have recently expanded our GridDatatype (aka GeoGrid) to allow 2 time dimensions (runtime and forecast time) and an ensemble dimension, with specialized kinds of aggregation to deal with them. We are working on the TIGGE datasets along these lines, except we have Grib2 files instead of NetCDF. Example files and use cases would be welcome.

John

Bryan Lawrence wrote:
> Hi Folks
>
> This email has no new information, just me attempting to get the email thread
> between Jonathan, Jamie and Paco into something I can digest. I'm sending
> it to the list as a public service. Feel free to ignore it if your mail
> client allows you to digest Jamie's attachments without reflux ...
>
> Please, please, can we have these discussions in public !!!
>
> I'll respond to the actual content tomorrow. It may require thinking, it
> certainly requires reading :-)
>
> Bryan
>
> ----------------------------------------------------------------------------
>
> Jamie's Summary email (26th of October)
>
> Hello,
>
> first - apologies, I took this thread off list, but it has thrown up a
> lot of issues and I think we need to bring it back to a more public
> forum - but this means bombarding everyone with a set of mails all in
> one go. I'll try and summarise, but will include all mails for context.
> I hope this is OK, and doesn't make this too laborious to follow.
>
> We have not settled all the issues - but here is my attempt at
> summarising, what I think the current consensus is - I'm sure Jonathan
> or Paco will correct me if I've misrepresented them.
>
> 1) Ensemble forecasts are made over a range of time scales (short-range,
> medium-range, seasonal, decadal, climate) and as far as possible we
> should make sure any additions to the CF standard are applicable to all
> time scales. Its probably true that not all communities are represented
> on this list... which we need to do something about to make any
> suggestion useful across the board. (I'm happy to run any suggestions
> we come up with past the various ensemble groups we have at the Met
> Office)
>
> 2) A useful addition to the CF standard to help support ensemble
> forecasts is to assign standard names to some global attributes. This
> then enables you to make NetCDF files with multi model variables in them
> and carrying across the global attribute information from any single
> model files.
>
> e.g. files mod1.nc, mod2.nc ... mod30.nc look like
>
> netcdf mod1.nc { ...
>
> variables:
> float temp(time,lat,lon);
> ...
> // global attributes
> : source = 'model 1 output'
> }
>
> netcdf mod2.nc { ...
>
> variables:
> float temp(time,lat,lon);
> ...
> //global attributes
> : source = 'model 2 output'
>
> }
>
> etc.
>
> This then becomes a file with multi model variables
>
> netcdf ens.nc {
>
> dimensions:
> ...
> realization = 30;
> len80 = 80 ;
>
> variables:
> float temp(time,realization,lat,lon);
>
> char source(realization,len80) ;
> source:standard_name='source' ;
>
> data:
> source=['model 1 output', 'model 2 output',...]
>
> }
>
> 2) New standard names 'source', 'institution', (leaving the other global
> attributes mentioned in CF 2.6.2 until they are needed by someone -
> though for compatibility with CMOR we may need to make 'title' a
> standard name from the outset.)
>
> 3) for model data use the global attribute 'source' and any variables
> with the standard_name 'source' to indicate the model used (just as CMOR
> does) e.g. HadCM3. This can be extended to support perturbed physics
> memebers if needed. This covers Paco's original suggestions of
> forecast_system_version_number and forecast_method_number.
>
> 4) for model data use the variable with the standard_name 'institution'
> to indicate the institution who developed the forecast system (this may
> be different from the original model developers - which is in source
> (either implied or explicit). It may also be different to the place
> where the data was created or is archived. This could be represented by
> the global attribute 'institution' in a multi-model file. (though I
> think this is still to be fully resolved)
>
> 5) There is still information in Paco's original posting that isn't
> covered. The explicitly scientific includes things like 'experiment_id'
> and some indicator of 'initial_conditions'. There are also bits of more
> administration type information such as 'archive_time'. Though we
> haven't talked about this admin type info as much.
>
> 6) 'experiment_id' could be made a global attribute and standard_name.
> It is used to indicate something like the intent of the experiment. In
> a climate change context this would be something like the forcing
> scenario (SRES A1B). (this forms part of the 'title' in CMOR - but its
> useful to separate this off).
>
> 7) 'initial_condition' (or something like) could be made a global
> attribute and a standard_name. There is potential confusion here -
> which I hadn't picked until now - in that CMOR uses 'realization' to
> indicate initial condition, but CF uses 'realization' in a more general
> sense (which could be initial condition, model ensemble, forcing
> ensemble, or grand ensemble based on any combination of these...). I
> don't think we agreed what the content of this name should be. Either a
> simple numerical indicator, or a reference to the model start dump data
> set.
>
> I hope that is a fair summary. Any comments welcome. E-mail history
> attached.
>
> Jamie
>
> (does this start to feel a bit namespacey - we're saying if you are in
> the model 'namespace' then the meaning of source, etc. is this. Outside
> of the model 'namespace' it means something a bit different?)
>
> ------------------------------------------------------------------------
> Offlist, from Jamie on October 24 (Email 1)
>
> Hello Paco,
>
> sorry to have sat on this a while. This seemed quite a long detailed e-
> mail so have posted to individuals who I think may be interested rather
> than the whole list.
>
> The reason I suggested 'realization' as a standard name rather than
> 'ensemble_member' is that not all forecasts of this type come from model
> ensembles. (e.g. [sorry to blow my own trumpet/tuba] Stott and
> Kettleborough, Nature 2002, 416, 723-726). Yet you can imagine these
> forecasts being processed in very similar ways to model ensembles to
> produce say probability density functions of forecast variables.
>
> I think you are suggesting new meta data tags which would be used on top
> of the 'realization', 'realization weight' standard_name and
> standard_name qualifiers? So you want to do something like:
>
> dimensions:
> lat = 18 ;
> lon = 36 ;
> time = 10 ;
> realization = 100 ;
> strlen = 80 ;
> variables:
> float temp(realization,time,lat,lon) ; // each sample is the result
> from one realization
> temp:long_name = "Temperature at 1.5m" ;
> temp:standard_name = "air_temperature" ;
> temp:ancillary_variables = "weights" ;
> float weights(realization) ; // the weight applied to each
> realization
> weights:long_name = "likelihood weights for 1.5m air temperature" ;
> weights:standard_name = "air_temperature realization_weights" ;
> char exptid(realization,strlen) ;
> exptid:standard_name = "experiment_identifier" ;
> char oc(realization,strlen) ;
> oc:standard_name = "originating_centre" ; // etc.
>
> (of course the user does not have to use the suggested weights - they
> can use their own based on exptid, oc, and your other suggested tags)
>
> I think there is a question of whether your tags are the most
> appropriate ones (I'm not saying they aren't - I just think its worth
> raising) - for instance I could create a single NetCDF data set from the
> IPCC AR4 archive based on all the models e.g. monthly mean surface
> temperature from all the models, for all future forcing scenarios (has
> this been done?). In that case I think you could populate your meta
> data arrays with parts of the CMOR single file global headers (as
> mentioned elsewhere)
>
> Specifically:
> * experiment_identifier represents the CMOR experiment_id that goes into
> the single model title attribute.
> * originating_centre represents the CMOR institution
> forecast_system_version_number represents something like the CMOR source
> string
> * forecast_method_number - I don't think this has an equivalent in
> CMOR/IPCC, but it could in a perturbed physics forecast (e.g
> climateprediction.net)
> * ensemble_member_number represents CMOR realization
>
> This seems to suggest we should rethink the names to reflect the
> parallel use? For instance should we have some sort of
> policy/convention for converting global attributes from single model
> files into standard_names for their realisation/ensemble counter parts?
> e.g. title could then replace your 'experiment_identifier' and have the
> standard_name 'realization_title', similarly for source ->
> 'realization_source' etc.? (though this starts to look funny for CMOR's
> realization which represent initial conditions).
>
> I think you would also want to adopt some conventions as to the content
> - though these could be 'project' specific (e.g. operational forecasts,
> IPCC AR4). I think you really want to enforce consistency within a
> project though - you suggest you may have different centres using
> different names for the experiment_identifier (or realization_title)
> even though these really are the same experiment - this is a bad thing.
> Arguably in the case of initial condition ensembles you might want to
> introduce a label to the URL of the dataset that is the initial
> conditions (if these are publishable). e.g.
> realization_initial_condition could be a string with contents
> identifying the actual initial condition data sets. (though this sort
> of introduces a level of indirection into CF which until now hasn't
> really be done).
>
> Of course another way to side step all this is to have your own
> 'operational' convention for the content of a variable with
> standard_name='realization'.
>
> e.g.
>
> dim:
> ensemble=10
> strlen=80
> var:
> ensemble(ensemble,strlen):
> ensemble:standard_name="realization" ;
>
> ensemble=['id:SRES A1B, model:HadCM3, source:Hadley Centre',
> 'id:SRES B2, model:HadCM3, source:Hadley Centre',
> 'id:SRES A1B, model:HadGEM1, source:Hadley Centre',
> ...]
>
> Though I'm not sure I really like this method - it seems to make more
> sense to standardise (at least on a project by project basis) real
> attributes, rather than 'sub attributes'.
>
> I don't think I have covered everything from your posting, nor have I
> said too much that is new - but I've run out of steam for now (sorry).
> Dealing with perturbed physics ensembles I think gets really tricky -
> especially to do in a close to self describing way.
>
> Jamie
>
> ps at discovery level we had a proposal for dealing with this kind of
> model data - it was always hard to understand the best way to represent
> model ensembles. http://proj.badc.rl.ac.uk/ndg/wiki/NumSim if you want
> the gory details.
>
> ----------------------------------------------------------------------------
> Jonathan 24th of October (Email 2)
>
>
> Dear Jamie
>
> Thanks for your email to Paco - good points. I think if we standardise names
> for "discovery" metadata this ought to be done with consideration by people
> who have thought about this. Alternatively Paco could standardise the content
> of a single standard name such as realization_name. Why not send your email to
> the CF list? Although it is detailed, there are others who might comment -
> Bryan, for instance.
>
> Best wishes
>
> Jonathan
>
> ----------------------------------------------------------------------------
> Paco 24th of October (Email 3)
>
> Dear Jamie,
>
> Thanks a lot for your detailed message.
>
> I agree with you that ensemble_member is not the solution to cope with
> forecasts formulated with different methods: dynamical models or
> empirical/statistical systems. That is precisely the reason I proposed
> the new variables, to cater for all these options.
>
> As you explain, realization and realization_weight offer a good start
> point. The new variables could be used, as you suggest, on top of them.
>
> The link to the CMOR global attributes is very interesting, as it points
> out at the need to bring together the climate change and weather and
> climate forecasting communities. I understand from your message that
> these attributes would have to become variables in a multi-forecast
> system NetCDF file.
>
> Concerning your proposed standard names:
> - "experiment_identifier represents the CMOR experiment_id that goes
> into the single model title attribute": Let's take experiment_id
> - "originating_centre represents the CMOR institution": This is a bit
> trickier. Operational systems (not only data generated in a project)
> need to distinguish between the data generated by an institution but
> distributed by another. An operational multi-model such as TIGGE would
> be an example, although a clearer one is offered by the operational
> European seasonal multi-model: while the Met Office is the originating
> centre of the GloSea/HadCM3 ensemble forecasts, ECMWF is responsible for
> its dissemination as part of the multi-model system. In this case,
> originating_centre=Met Office, while original_distributor=ECMWF.
> - "forecast_system_version_number represents something like the CMOR
> source string": I don't understand this one. Could you explain a bit
> more in detail? Furthermore, it seems that this variable shouldn't be a
> number, but a string detailing the version.
> - "forecast_method_number - I don't think this has an equivalent in
> CMOR/IPCC, but it could in a perturbed physics forecast (e.g
> climateprediction.net)": This is precisely the context in which we
> devised this variable. The Met Office Decadal Prediction System
> (DePreSys) is currently being developed to formulate seasonal and
> interannual forecasts using the perturbed-physics approach and requires
> this information. The same applies to multi-physics systems.
> - "ensemble_member_number represents CMOR realization": Fine by me.
>
> Although the idea of converting some of the already existing global
> attributes for single model files into variables for multi-forecast
> systems is great and will simplify life for the users, I'm afraid I find
> some of the translations you propose difficult to use in an operational
> forecasting context. For instance, "realization_title" would not have
> any meaning for a forecaster, as many forecasts are carried out as
> different sets of parallel experiments for which a title does not exist
> as the changes might be due to the removal of a satellite channel or the
> addition of certain flight data. The conversion of names would have to
> cater for the different needs of different modelling communities, and
> still having meaningful names.
>
> > I think you would also want to adopt some conventions as to thecontent
> > - though these could be 'project' specific (e.g. operationalforecasts,
> > IPCC AR4). I think you really want to enforce consistency within a
> > project though - you suggest you may have different centres using
> > different names for the experiment_identifier (or realization_title)
> > even though these really are the same experiment - this is a badthing.
> > Arguably in the case of initial condition ensembles you might want to
> > introduce a label to the URL of the dataset that is the initial
> > conditions (if these are publishable). e.g.
> > realization_initial_condition could be a string with contents
> > identifying the actual initial condition data sets. (though this sort
> > of introduces a level of indirection into CF which until now hasn't
> > really be done).
> The use of project specific conventions does not seem a good idea to me.
> Conventions for writing NetCDF files of operational forecasts (which
> should follow a quite stringent set of rules to allow operational use)
> do not exist. It might be a good time to start creating them. I don't
> think there is a reason for having different sets of conventions for
> IPCC or operational multi-forecast system files, as you illustrated with
> the use of the CMOR names.
>
> As for the use of different names for the experiment_identifier by
> different originating_centres included in the same multi-forecast system
> file, this obeys to the fact that each centre is not creating the
> forecasts with the same experiment: the way to produce the initial
> conditions is different, one centre might cycle the forecast model more
> frequently than another, etc. Again, operational constrains.
>
> Concerning the use of a URL which points at the initial conditions used,
> although desirable, it is rather difficult to implement. Imagine the
> difficulty to identify the initial conditions in an operational
> medium-range multi-model forecast system producing forecasts every 12
> hours from initial conditions created from millions of observations. In
> my opinion, using realization to identify the number of the ensemble
> member would be already helpful.
>
> It's obviously difficult to sort out the problem of coding ensembles of
> simulations from multiple systems (that's maybe why it hasn't been done
> before). Unless a general solution, instead of patches to satisfy the
> requirements by some projects, is available, I'm afraid the operational
> forecast community will hardly be on board.
>
> Best regards,
> Paco
>
> ----------------------------------------------------------------------------
> Jonathan (24th of October, Email 4)
>
> Dear Paco
>
> Are the definitions of these global attributes close enough to a couple of
> yours that we could (a) define them a bit more closely and (b) reuse them
> as standard names:
>
> institution
> Specifies where the original data was produced.
> source
> The method of production of the original data. If it was model-generated,
> source should name the model and its version, as specifically as could be useful.
> If it is observational, source should characterize it
> (e.g., "surface observation" or "radiosonde").
>
> Cheers
>
> Jonathan
>
> ----------------------------------------------------------------------------
> Paco (24 October, Email 5)
>
> Hi Jonathan,
>
> As I mentioned in my previous message to Jamie, the use of a variable
> similar to "institution" in a multi-forecast system prediction may not
> be enough to discriminate the different situations that could happen.
> Let's imagine a forecast produced with the dynamical model x at its own
> institution and at institution y. Those two forecasts, even if valid at
> the same verification time, might not be the same, so that we need
> another variable to indicate where the data are distributed from.
> Therefore, the definition of "institution" below might not be precise
> enough. This situation is already happening and will be more frequent in
> the future as prediction models developed by one institution are run at
> a different one.
>
> As for "source", it could cater for the variables I called
> "forecast_system_version_number" and for "forecast_method_number", if
> both have to be included in the same string. I suppose that the model
> description needs to be quite detailed, which will require long strings
> for the source variable.
>
> Cheers,
> Paco
> ----------------------------------------------------------------------------
> Jonathan (25 October, Email 6)
>
> Dear Paco
>
>
>>the use of a variable
>>similar to "institution" in a multi-forecast system prediction may not
>>be enough to discriminate the different situations that could happen.
>>Let's imagine a forecast produced with the dynamical model x at its own
>>institution and at institution y. Those two forecasts, even if valid at
>>the same verification time, might not be the same, so that we need
>>another variable to indicate where the data are distributed from.
>>Therefore, the definition of "institution" below might not be precise
>>enough. This situation is already happening and will be more frequent in
>>the future as prediction models developed by one institution are run at
>>a different one.
>
>
> It seems to me that "institution" is the correct description of the place
> which produces the data, precisely like the global attribute. The indication
> that it was done with model x doesn't seem to me to be part of the description
> of the institution, but part of the description of the forecast system used.
>
>
>>As for "source", it could cater for the variables I called
>>"forecast_system_version_number" and for "forecast_method_number", if
>>both have to be included in the same string. I suppose that the model
>>description needs to be quite detailed, which will require long strings
>>for the source variable.
>
>
> As suggested by the definition of the source attribute, this could name the
> model e.g. GloSea (your forecast_system_version_number, but strings give
> self-describing info, unlike numbers) and its variant e.g. in a perturbed
> parameter ensemble, or if the model has subversions (your
> forecast_method_number). It could also include the name of the centre which
> produced the model e.g. "Met Office GloSea" or "NCAR CCSM 3.0". I would guess
> that in general it is sufficient to inpsect a long string. Possibly if you had
> to group the members of a large perturbed physics ensemble together, separate
> from other models, it would be useful to have a separate identifier for them
> e.g. model_perturbation_id.
>
> That leaves realization, for members of an ensemble using the same model but
> different input data, and experiment_id, for describing the intent of the
> experiment. This would have been a good place to store scenario information
> for IPCC experiments, for example. What do you think, Jamie?
>
> I would propose that if we allow realization and experiment_id as global
> attributes too, for symmetry. Do source and institution need to be defined
> as standard names so they can be used for coordinates? They are already
> permitted as attributes of data variables.
>
> Cheers
>
> Jonathan
> ----------------------------------------------------------------------------
> Jamie (25 October, Email 7)
>
> Hello Paco, Jonathan,
>
> there is a lot here isn't there? I guess I made a bad judgement taking
> this off the list. Is the best way to get this back on the list to send
> a summary mail - or just forward all the mails in this discussion to the
> list?
>
> What do we need to take this forward?
>
> 1) Agree a standard way to take single model file global attributes into
> multi-model 'pseudo-coordinate' variables. (I think this then becomes
> more than discovery meta data - I think it becomes usage meta data - you
> may want to slice your ensemble on initial conditions, or scenario or
> whatever). Jonathan suggests simply making the relevant global
> attributes standard_names as well. I think this is a good solution.
>
> 2) Agree whether title, source, institution are the appropriate global
> attributes to convert to pseudo-coordinate variables. I think Paco is
> concerned that these terms are not what operational forecasters expect
> to see, and so using them may isolate this community from CF. I think
> we all agree that we should try and accommodate ensemble forecasts on
> all timescales in the same way.
>
> 3) Agree appropriate content for these attributes/pseudo coordinates for
> model data.
>
> I think for 2/3 we have
>
> source - the model and version that produced this data (e.g. HadCM3).
> This may include info on perturbed physics (which may be a long list of
> perturbed parameter values?? e.g. 'HadCM3 but DTICE=0; RHCRIT=
> (0.9,0.9...)', or it may be a reference to a doc defining that perturbed
> model version)
>
> institute - the originating centre ('Met Office') [and it would be
> possible to have an institute pseudo-coordinate to represent originating
> centre, and a global attribute institute to represent Paco's
> dissemination/archive centre]
>
> title - I'm less sure on this one, and we may have some backwards
> compatibility problems with CMOR. In CMOR title is used to represent
> the experiment_id (or forcing scenario) and other things. Yet we think
> this might be worthy of splitting out into another attribute. Paco is
> also uncomfortable with title.
>
> 4) Understand what is left over for describing model ensembles
> (experiment_id, realization*/initial_condition). Suggest these are
> included as additional global attributes and pseudo-coordinates, and
> agree their content. I think Paco had a whole list here (archive time,
> etc) - some of which we haven't really talked about.
>
> Does that sound reasonable? If so I am happy to try and summarise to
> the list.
>
> Jamie
>
> (*I think this use of realization in this context is different from the
> current use of realization in CF - that we have just added - so we'll
> need a different name/label.)
>
> (When suggesting having pointers to data sets for initial conditions I
> meant simply the assimilation/model start conditions - Paco is right it
> would be a nightmare to keep track of the all obs that go into the
> assimilation. That said not all assimilation/model start conditions may
> be publically available so this may not work? I have reservations about
> the usefulness of simply a numerical id to indicate the initial
> condition ensemble number. I'd rather try and give it some more
> meaningful content).
>
> ----------------------------------------------------------------------------
> Jonathan 25th October, Email 8
>
> Dear Jamie
>
> Thank you. That is a good summary. I don't like title either, by the way.
> You could post it to the email list, I think, perhaps with the earlier
> ones appended for reference!
>
> Cheers
>
> Jonathan
>
> ----------------------------------------------------------------------------
> Paco 25th of October, Email 9
>
> Dear Jonathan and Jamie,
>
> I agree that allowing the relevant global attributes to be
> standard_names as well might help a lot.
>
> As for the use of title, source and institution, see below.
>
>
>>>the use of a variable
>>>similar to "institution" in a multi-forecast system prediction may not
>>>be enough to discriminate the different situations that could happen.
>>>Let's imagine a forecast produced with the dynamical model x at its own
>>>institution and at institution y. Those two forecasts, even if valid at
>>>the same verification time, might not be the same, so that we need
>>>another variable to indicate where the data are distributed from.
>>>Therefore, the definition of "institution" below might not be precise
>>>enough. This situation is already happening and will be more frequent in
>>>the future as prediction models developed by one institution are run at
>>>a different one.
>>
>>It seems to me that "institution" is the correct description of the place
>>which produces the data, precisely like the global attribute. The indication
>>that it was done with model x doesn't seem to me to be part of the description
>>of the institution, but part of the description of the forecast system used.
>
>
> The problem I see with this use of "institution" or "institute" is how
> to distinguish the forecasts run by centre x at institution y. While the
> model (system and version) could be specified in "source", the
> institution x is not the same as institution y. This sort of situation
> happens in the context of the multi-model system developed by APCC
> (http://www2.apcc21.net/index.php). KMA (Korea) produces seasonal
> forecasts with the NCAR (USA) model at APCC where the data are archived.
> In this case, while "institution" could take the value KMA and the
> information about the version of the NCAR model could go into "source",
> we still need to indicate that the data have been produced,
> postprocessed and stored by APCC, which is the centre responsible. A
> variable such as "forecast_producer" to indicate the centre where the
> model was run could clarify the situation. In a global seasonal
> multi-model context, the same NetCDF file might contain other forecasts
> valid at the same verification time produced by other centres at, for
> instance, ECMWF, which would prevent the variable "forecast_producer" to
> be a global attribute. This is the sort of complex case that the working
> group on seasonal-to-interannual prediction (WGSIP) is trying to
> consider and why we initially proposed the variable "original_distributor".
>
>
>>>As for "source", it could cater for the variables I called
>>>"forecast_system_version_number" and for "forecast_method_number", if
>>>both have to be included in the same string. I suppose that the model
>>>description needs to be quite detailed, which will require long strings
>>>for the source variable.
>>
>>As suggested by the definition of the source attribute, this could name the
>>model e.g. GloSea (your forecast_system_version_number, but strings give
>>self-describing info, unlike numbers) and its variant e.g. in a perturbed
>>parameter ensemble, or if the model has subversions (your
>>forecast_method_number). It could also include the name of the centre which
>>produced the model e.g. "Met Office GloSea" or "NCAR CCSM 3.0". I would guess
>>that in general it is sufficient to inpsect a long string. Possibly if you had
>>to group the members of a large perturbed physics ensemble together, separate
>>from other models, it would be useful to have a separate identifier for them
>>e.g. model_perturbation_id.
>
>
> This use of "source" and the definition given by Jamie might do the job,
> except for the inclusion of the centre. There is an increasing number of
> cases where a model developed by a centre is run by another one. I
> believe this information should be kept separate (see comments above
> concerning "institution").
>
> I don't see the need to have a separate identifier for the model
> perturbation. That would be something similar to the variable
> "forecast_system_method" that we agreed to include in source. As I
> understand it, source might also contain URL pointers.
>
>
>>That leaves realization, for members of an ensemble using the same model but
>>different input data, and experiment_id, for describing the intent of the
>>experiment. This would have been a good place to store scenario information
>>for IPCC experiments, for example. What do you think, Jamie?
>>
>>I would propose that if we allow realization and experiment_id as global
>>attributes too, for symmetry. Do source and institution need to be defined
>>as standard names so they can be used for coordinates? They are already
>>permitted as attributes of data variables.
>
>
> Does any of you have a guess of what the dimensions of "realization" and
> "experiment_id" would be for a multi-model ensemble forecast built up
> with two models with 5 and 11 ensemble members?
>
> If "realization" is not to be used to identify ensemble members of an
> initial condition ensemble forecasts, then I suggest we reconsider the
> variable "ensemble_member_number".
>
> I'm still reluctant to give a use to title in a forecasting context.
> Honestly, I don't see what concept it could correspond to.
>
> To end up with this long message, I feel sorry that there are not more
> members of the medium-range/seasonal operational forecast community
> following the discussion. I wouldn't like to miss any of the relevant
> issues.
>
> Cheers,
> Paco
> ----------------------------------------------------------------------------
> Paco 25th of October, Email 10
>
> Yes, I'm happy with this way forward (sending everything to the list)
> and with the closer link between the needs of coding data of different
> time scales and operational uses.
>
> Thanks a lot for the comments.
>
> Cheers,
> Paco
> ----------------------------------------------------------------------------
> Jonathan 25th of October, Email 11
>
> Dear Paco
>
>
>>The problem I see with this use of "institution" or "institute" is how
>>to distinguish the forecasts run by centre x at institution y. While the
>>model (system and version) could be specified in "source", the
>>institution x is not the same as institution y. This sort of situation
>>happens in the context of the multi-model system developed by APCC
>>(http://www2.apcc21.net/index.php). KMA (Korea) produces seasonal
>>forecasts with the NCAR (USA) model at APCC where the data are archived.
>>In this case, while "institution" could take the value KMA and the
>>information about the version of the NCAR model could go into "source",
>>we still need to indicate that the data have been produced,
>>postprocessed and stored by APCC, which is the centre responsible.
>
>
> Ah. So there are *three* centres involved - not just x and y, but z as well.
> But you say "KMA produces" and "data have been produced ... by APCC". So I
> am confused. Which of them has produced the data? These three roles are not
> clear to me. I agree that NCAR belongs in the source.
>
> Between global attribute and coordinate variable, there is also the
> possibility of data variable attribute. These attribute are already allowed
> on data variables. It depends whether you "feel" they are coordinates or
> attributes.
>
>
>>To end up with this long message, I feel sorry that there are not more
>>members of the medium-range/seasonal operational forecast community
>>following the discussion. I wouldn't like to miss any of the relevant
>>issues.
>
>
> Yes. We should get back into public view!
>
> Cheers
>
> Jonathan
> ----------------------------------------------------------------------------
> Paco 25th of October, Email 12
>
> Dear Jonathan,
>
>
>>>The problem I see with this use of "institution" or "institute" is how
>>>to distinguish the forecasts run by centre x at institution y. While the
>>>model (system and version) could be specified in "source", the
>>>institution x is not the same as institution y. This sort of situation
>>>happens in the context of the multi-model system developed by APCC
>>>(http://www2.apcc21.net/index.php). KMA (Korea) produces seasonal
>>>forecasts with the NCAR (USA) model at APCC where the data are archived.
>>>In this case, while "institution" could take the value KMA and the
>>>information about the version of the NCAR model could go into "source",
>>>we still need to indicate that the data have been produced,
>>>postprocessed and stored by APCC, which is the centre responsible.
>>
>>Ah. So there are *three* centres involved - not just x and y, but z as well.
>>But you say "KMA produces" and "data have been produced ... by APCC". So I
>>am confused. Which of them has produced the data? These three roles are not
>>clear to me. I agree that NCAR belongs in the source.
>
>
> Sorry for not being clear enough. There may be 3 centres involved: the
> centre developing the forecast system, the centre running the forecast
> and the centre where the forecast is run at. In my example, NCAR is the
> model developer (it goes into source), KMA is the institution that
> develops the forecast system (it modifies the model to convert it into a
> climate forecast model) and APCC is the institution where the forecasts
> are run and archived. Another example can be provided from the EUROSIP
> operational multi-model ensemble, wherein GloSea is the forecast system
> (source), Met Office is the forecast system developer (institution) and
> ECMWF is where the model is run at (forecast_producer).
>
>
>>Between global attribute and coordinate variable, there is also the
>>possibility of data variable attribute. These attribute are already allowed
>>on data variables. It depends whether you "feel" they are coordinates or
>>attributes.
>
>
> Sorry, I miss the differences. I looked at the CF conventions
> documentation and couldn't find the data variable attribute definition.
> However, it is desirable that the variables we've been discussing are
> not global for this sort of files.
>
> Cheers,
> Paco
> ----------------------------------------------------------------------------
> Jonathan 25th of October, Email 13 of 13
>
> Just to clarify, and then silence until Jamie has summarised:
>
>
>>I looked at the CF conventions
>>documentation and couldn't find the data variable attribute definition.
>
>
> In 2.6.2
>
> The NUG defines title and history to be global attributes.
> We wish to allow the newly defined attributes, i.e., institution, source,
> references, and comment, to be either global or assigned to individual
> variables. When an attribute appears both globally and as a variable attribute,
> the variable's version has precedence.
>
> Jonathan
> ----------------------------------------------------------
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Thu Oct 26 2006 - 18:17:32 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST