⇐ ⇒

[CF-metadata] CF and multi-forecast system ensemble data

From: Bryan Lawrence <b.n.lawrence>
Date: Thu, 26 Oct 2006 16:53:53 +0100

Hi Folks

This email has no new information, just me attempting to get the email thread
between Jonathan, Jamie and Paco into something I can digest. I'm sending
it to the list as a public service. Feel free to ignore it if your mail
client allows you to digest Jamie's attachments without reflux ...

Please, please, can we have these discussions in public !!!

I'll respond to the actual content tomorrow. It may require thinking, it
certainly requires reading :-)

Bryan

----------------------------------------------------------------------------

Jamie's Summary email (26th of October)

Hello,

first - apologies, I took this thread off list, but it has thrown up a
lot of issues and I think we need to bring it back to a more public
forum - but this means bombarding everyone with a set of mails all in
one go. I'll try and summarise, but will include all mails for context.
I hope this is OK, and doesn't make this too laborious to follow.

We have not settled all the issues - but here is my attempt at
summarising, what I think the current consensus is - I'm sure Jonathan
or Paco will correct me if I've misrepresented them.

1) Ensemble forecasts are made over a range of time scales (short-range,
medium-range, seasonal, decadal, climate) and as far as possible we
should make sure any additions to the CF standard are applicable to all
time scales. Its probably true that not all communities are represented
on this list... which we need to do something about to make any
suggestion useful across the board. (I'm happy to run any suggestions
we come up with past the various ensemble groups we have at the Met
Office)

2) A useful addition to the CF standard to help support ensemble
forecasts is to assign standard names to some global attributes. This
then enables you to make NetCDF files with multi model variables in them
and carrying across the global attribute information from any single
model files.

e.g. files mod1.nc, mod2.nc ... mod30.nc look like

netcdf mod1.nc { ...

variables:
   float temp(time,lat,lon);
...
// global attributes
      : source = 'model 1 output'
}

netcdf mod2.nc { ...

variables:
   float temp(time,lat,lon);
...
//global attributes
     : source = 'model 2 output'

}

 etc.

This then becomes a file with multi model variables

netcdf ens.nc {

dimensions:
...
   realization = 30;
   len80 = 80 ;

variables:
   float temp(time,realization,lat,lon);

   char source(realization,len80) ;
     source:standard_name='source' ;

data:
   source=['model 1 output', 'model 2 output',...]

}

2) New standard names 'source', 'institution', (leaving the other global
attributes mentioned in CF 2.6.2 until they are needed by someone -
though for compatibility with CMOR we may need to make 'title' a
standard name from the outset.)

3) for model data use the global attribute 'source' and any variables
with the standard_name 'source' to indicate the model used (just as CMOR
does) e.g. HadCM3. This can be extended to support perturbed physics
memebers if needed. This covers Paco's original suggestions of
forecast_system_version_number and forecast_method_number.

4) for model data use the variable with the standard_name 'institution'
to indicate the institution who developed the forecast system (this may
be different from the original model developers - which is in source
(either implied or explicit). It may also be different to the place
where the data was created or is archived. This could be represented by
the global attribute 'institution' in a multi-model file. (though I
think this is still to be fully resolved)

5) There is still information in Paco's original posting that isn't
covered. The explicitly scientific includes things like 'experiment_id'
and some indicator of 'initial_conditions'. There are also bits of more
administration type information such as 'archive_time'. Though we
haven't talked about this admin type info as much.

6) 'experiment_id' could be made a global attribute and standard_name.
It is used to indicate something like the intent of the experiment. In
a climate change context this would be something like the forcing
scenario (SRES A1B). (this forms part of the 'title' in CMOR - but its
useful to separate this off).

7) 'initial_condition' (or something like) could be made a global
attribute and a standard_name. There is potential confusion here -
which I hadn't picked until now - in that CMOR uses 'realization' to
indicate initial condition, but CF uses 'realization' in a more general
sense (which could be initial condition, model ensemble, forcing
ensemble, or grand ensemble based on any combination of these...). I
don't think we agreed what the content of this name should be. Either a
simple numerical indicator, or a reference to the model start dump data
set.

I hope that is a fair summary. Any comments welcome. E-mail history
attached.

Jamie

(does this start to feel a bit namespacey - we're saying if you are in
the model 'namespace' then the meaning of source, etc. is this. Outside
of the model 'namespace' it means something a bit different?)

------------------------------------------------------------------------
Offlist, from Jamie on October 24 (Email 1)

Hello Paco,

sorry to have sat on this a while. This seemed quite a long detailed e-
mail so have posted to individuals who I think may be interested rather
than the whole list.

The reason I suggested 'realization' as a standard name rather than
'ensemble_member' is that not all forecasts of this type come from model
ensembles. (e.g. [sorry to blow my own trumpet/tuba] Stott and
Kettleborough, Nature 2002, 416, 723-726). Yet you can imagine these
forecasts being processed in very similar ways to model ensembles to
produce say probability density functions of forecast variables.

I think you are suggesting new meta data tags which would be used on top
of the 'realization', 'realization weight' standard_name and
standard_name qualifiers? So you want to do something like:

dimensions:
  lat = 18 ;
  lon = 36 ;
  time = 10 ;
  realization = 100 ;
  strlen = 80 ;
variables:
  float temp(realization,time,lat,lon) ; // each sample is the result
from one realization
    temp:long_name = "Temperature at 1.5m" ;
    temp:standard_name = "air_temperature" ;
    temp:ancillary_variables = "weights" ;
  float weights(realization) ; // the weight applied to each
realization
    weights:long_name = "likelihood weights for 1.5m air temperature" ;
    weights:standard_name = "air_temperature realization_weights" ;
  char exptid(realization,strlen) ;
     exptid:standard_name = "experiment_identifier" ;
  char oc(realization,strlen) ;
     oc:standard_name = "originating_centre" ; // etc.

(of course the user does not have to use the suggested weights - they
can use their own based on exptid, oc, and your other suggested tags)

I think there is a question of whether your tags are the most
appropriate ones (I'm not saying they aren't - I just think its worth
raising) - for instance I could create a single NetCDF data set from the
IPCC AR4 archive based on all the models e.g. monthly mean surface
temperature from all the models, for all future forcing scenarios (has
this been done?). In that case I think you could populate your meta
data arrays with parts of the CMOR single file global headers (as
mentioned elsewhere)

Specifically:
* experiment_identifier represents the CMOR experiment_id that goes into
the single model title attribute.
* originating_centre represents the CMOR institution
forecast_system_version_number represents something like the CMOR source
string
* forecast_method_number - I don't think this has an equivalent in
CMOR/IPCC, but it could in a perturbed physics forecast (e.g
climateprediction.net)
* ensemble_member_number represents CMOR realization

This seems to suggest we should rethink the names to reflect the
parallel use? For instance should we have some sort of
policy/convention for converting global attributes from single model
files into standard_names for their realisation/ensemble counter parts?
e.g. title could then replace your 'experiment_identifier' and have the
standard_name 'realization_title', similarly for source ->
'realization_source' etc.? (though this starts to look funny for CMOR's
realization which represent initial conditions).

I think you would also want to adopt some conventions as to the content
- though these could be 'project' specific (e.g. operational forecasts,
IPCC AR4). I think you really want to enforce consistency within a
project though - you suggest you may have different centres using
different names for the experiment_identifier (or realization_title)
even though these really are the same experiment - this is a bad thing.
Arguably in the case of initial condition ensembles you might want to
introduce a label to the URL of the dataset that is the initial
conditions (if these are publishable). e.g.
realization_initial_condition could be a string with contents
identifying the actual initial condition data sets. (though this sort
of introduces a level of indirection into CF which until now hasn't
really be done).

Of course another way to side step all this is to have your own
'operational' convention for the content of a variable with
standard_name='realization'.

e.g.

dim:
  ensemble=10
  strlen=80
var:
  ensemble(ensemble,strlen):
    ensemble:standard_name="realization" ;

ensemble=['id:SRES A1B, model:HadCM3, source:Hadley Centre',
          'id:SRES B2, model:HadCM3, source:Hadley Centre',
          'id:SRES A1B, model:HadGEM1, source:Hadley Centre',
 ...]
 
Though I'm not sure I really like this method - it seems to make more
sense to standardise (at least on a project by project basis) real
attributes, rather than 'sub attributes'.
 
I don't think I have covered everything from your posting, nor have I
said too much that is new - but I've run out of steam for now (sorry).
Dealing with perturbed physics ensembles I think gets really tricky -
especially to do in a close to self describing way.
  
Jamie

ps at discovery level we had a proposal for dealing with this kind of
model data - it was always hard to understand the best way to represent
model ensembles. http://proj.badc.rl.ac.uk/ndg/wiki/NumSim if you want
the gory details.

----------------------------------------------------------------------------
Jonathan 24th of October (Email 2)


Dear Jamie

Thanks for your email to Paco - good points. I think if we standardise names
for "discovery" metadata this ought to be done with consideration by people
who have thought about this. Alternatively Paco could standardise the content
of a single standard name such as realization_name. Why not send your email to
the CF list? Although it is detailed, there are others who might comment -
Bryan, for instance.

Best wishes

Jonathan

----------------------------------------------------------------------------
Paco 24th of October (Email 3)

Dear Jamie,

Thanks a lot for your detailed message.

I agree with you that ensemble_member is not the solution to cope with
forecasts formulated with different methods: dynamical models or
empirical/statistical systems. That is precisely the reason I proposed
the new variables, to cater for all these options.

As you explain, realization and realization_weight offer a good start
point. The new variables could be used, as you suggest, on top of them.

The link to the CMOR global attributes is very interesting, as it points
out at the need to bring together the climate change and weather and
climate forecasting communities. I understand from your message that
these attributes would have to become variables in a multi-forecast
system NetCDF file.

Concerning your proposed standard names:
- "experiment_identifier represents the CMOR experiment_id that goes
into the single model title attribute": Let's take experiment_id
- "originating_centre represents the CMOR institution": This is a bit
trickier. Operational systems (not only data generated in a project)
need to distinguish between the data generated by an institution but
distributed by another. An operational multi-model such as TIGGE would
be an example, although a clearer one is offered by the operational
European seasonal multi-model: while the Met Office is the originating
centre of the GloSea/HadCM3 ensemble forecasts, ECMWF is responsible for
its dissemination as part of the multi-model system. In this case,
originating_centre=Met Office, while original_distributor=ECMWF.
- "forecast_system_version_number represents something like the CMOR
source string": I don't understand this one. Could you explain a bit
more in detail? Furthermore, it seems that this variable shouldn't be a
number, but a string detailing the version.
- "forecast_method_number - I don't think this has an equivalent in
CMOR/IPCC, but it could in a perturbed physics forecast (e.g
climateprediction.net)": This is precisely the context in which we
devised this variable. The Met Office Decadal Prediction System
(DePreSys) is currently being developed to formulate seasonal and
interannual forecasts using the perturbed-physics approach and requires
this information. The same applies to multi-physics systems.
- "ensemble_member_number represents CMOR realization": Fine by me.

Although the idea of converting some of the already existing global
attributes for single model files into variables for multi-forecast
systems is great and will simplify life for the users, I'm afraid I find
some of the translations you propose difficult to use in an operational
forecasting context. For instance, "realization_title" would not have
any meaning for a forecaster, as many forecasts are carried out as
different sets of parallel experiments for which a title does not exist
as the changes might be due to the removal of a satellite channel or the
addition of certain flight data. The conversion of names would have to
cater for the different needs of different modelling communities, and
still having meaningful names.

> I think you would also want to adopt some conventions as to thecontent
> - though these could be 'project' specific (e.g. operationalforecasts,
> IPCC AR4). I think you really want to enforce consistency within a
> project though - you suggest you may have different centres using
> different names for the experiment_identifier (or realization_title)
> even though these really are the same experiment - this is a badthing.
> Arguably in the case of initial condition ensembles you might want to
> introduce a label to the URL of the dataset that is the initial
> conditions (if these are publishable). e.g.
> realization_initial_condition could be a string with contents
> identifying the actual initial condition data sets. (though this sort
> of introduces a level of indirection into CF which until now hasn't
> really be done).
The use of project specific conventions does not seem a good idea to me.
Conventions for writing NetCDF files of operational forecasts (which
should follow a quite stringent set of rules to allow operational use)
do not exist. It might be a good time to start creating them. I don't
think there is a reason for having different sets of conventions for
IPCC or operational multi-forecast system files, as you illustrated with
the use of the CMOR names.

As for the use of different names for the experiment_identifier by
different originating_centres included in the same multi-forecast system
file, this obeys to the fact that each centre is not creating the
forecasts with the same experiment: the way to produce the initial
conditions is different, one centre might cycle the forecast model more
frequently than another, etc. Again, operational constrains.

Concerning the use of a URL which points at the initial conditions used,
although desirable, it is rather difficult to implement. Imagine the
difficulty to identify the initial conditions in an operational
medium-range multi-model forecast system producing forecasts every 12
hours from initial conditions created from millions of observations. In
my opinion, using realization to identify the number of the ensemble
member would be already helpful.

It's obviously difficult to sort out the problem of coding ensembles of
simulations from multiple systems (that's maybe why it hasn't been done
before). Unless a general solution, instead of patches to satisfy the
requirements by some projects, is available, I'm afraid the operational
forecast community will hardly be on board.

Best regards,
Paco

----------------------------------------------------------------------------
Jonathan (24th of October, Email 4)

Dear Paco

Are the definitions of these global attributes close enough to a couple of
yours that we could (a) define them a bit more closely and (b) reuse them
as standard names:

institution
    Specifies where the original data was produced.
source
    The method of production of the original data. If it was model-generated,
source should name the model and its version, as specifically as could be useful.
If it is observational, source should characterize it
(e.g., "surface observation" or "radiosonde").

Cheers

Jonathan

----------------------------------------------------------------------------
Paco (24 October, Email 5)

Hi Jonathan,

As I mentioned in my previous message to Jamie, the use of a variable
similar to "institution" in a multi-forecast system prediction may not
be enough to discriminate the different situations that could happen.
Let's imagine a forecast produced with the dynamical model x at its own
institution and at institution y. Those two forecasts, even if valid at
the same verification time, might not be the same, so that we need
another variable to indicate where the data are distributed from.
Therefore, the definition of "institution" below might not be precise
enough. This situation is already happening and will be more frequent in
the future as prediction models developed by one institution are run at
a different one.

As for "source", it could cater for the variables I called
"forecast_system_version_number" and for "forecast_method_number", if
both have to be included in the same string. I suppose that the model
description needs to be quite detailed, which will require long strings
for the source variable.

Cheers,
Paco
----------------------------------------------------------------------------
Jonathan (25 October, Email 6)

Dear Paco

> the use of a variable
> similar to "institution" in a multi-forecast system prediction may not
> be enough to discriminate the different situations that could happen.
> Let's imagine a forecast produced with the dynamical model x at its own
> institution and at institution y. Those two forecasts, even if valid at
> the same verification time, might not be the same, so that we need
> another variable to indicate where the data are distributed from.
> Therefore, the definition of "institution" below might not be precise
> enough. This situation is already happening and will be more frequent in
> the future as prediction models developed by one institution are run at
> a different one.

It seems to me that "institution" is the correct description of the place
which produces the data, precisely like the global attribute. The indication
that it was done with model x doesn't seem to me to be part of the description
of the institution, but part of the description of the forecast system used.

> As for "source", it could cater for the variables I called
> "forecast_system_version_number" and for "forecast_method_number", if
> both have to be included in the same string. I suppose that the model
> description needs to be quite detailed, which will require long strings
> for the source variable.

As suggested by the definition of the source attribute, this could name the
model e.g. GloSea (your forecast_system_version_number, but strings give
self-describing info, unlike numbers) and its variant e.g. in a perturbed
parameter ensemble, or if the model has subversions (your
forecast_method_number). It could also include the name of the centre which
produced the model e.g. "Met Office GloSea" or "NCAR CCSM 3.0". I would guess
that in general it is sufficient to inpsect a long string. Possibly if you had
to group the members of a large perturbed physics ensemble together, separate
from other models, it would be useful to have a separate identifier for them
e.g. model_perturbation_id.

That leaves realization, for members of an ensemble using the same model but
different input data, and experiment_id, for describing the intent of the
experiment. This would have been a good place to store scenario information
for IPCC experiments, for example. What do you think, Jamie?

I would propose that if we allow realization and experiment_id as global
attributes too, for symmetry. Do source and institution need to be defined
as standard names so they can be used for coordinates? They are already
permitted as attributes of data variables.

Cheers

Jonathan
----------------------------------------------------------------------------
Jamie (25 October, Email 7)

Hello Paco, Jonathan,

there is a lot here isn't there? I guess I made a bad judgement taking
this off the list. Is the best way to get this back on the list to send
a summary mail - or just forward all the mails in this discussion to the
list?

What do we need to take this forward?

1) Agree a standard way to take single model file global attributes into
multi-model 'pseudo-coordinate' variables. (I think this then becomes
more than discovery meta data - I think it becomes usage meta data - you
may want to slice your ensemble on initial conditions, or scenario or
whatever). Jonathan suggests simply making the relevant global
attributes standard_names as well. I think this is a good solution.

2) Agree whether title, source, institution are the appropriate global
attributes to convert to pseudo-coordinate variables. I think Paco is
concerned that these terms are not what operational forecasters expect
to see, and so using them may isolate this community from CF. I think
we all agree that we should try and accommodate ensemble forecasts on
all timescales in the same way.

3) Agree appropriate content for these attributes/pseudo coordinates for
model data.

I think for 2/3 we have

source - the model and version that produced this data (e.g. HadCM3).
This may include info on perturbed physics (which may be a long list of
perturbed parameter values?? e.g. 'HadCM3 but DTICE=0; RHCRIT=
(0.9,0.9...)', or it may be a reference to a doc defining that perturbed
model version)

institute - the originating centre ('Met Office') [and it would be
possible to have an institute pseudo-coordinate to represent originating
centre, and a global attribute institute to represent Paco's
dissemination/archive centre]

title - I'm less sure on this one, and we may have some backwards
compatibility problems with CMOR. In CMOR title is used to represent
the experiment_id (or forcing scenario) and other things. Yet we think
this might be worthy of splitting out into another attribute. Paco is
also uncomfortable with title.

4) Understand what is left over for describing model ensembles
(experiment_id, realization*/initial_condition). Suggest these are
included as additional global attributes and pseudo-coordinates, and
agree their content. I think Paco had a whole list here (archive time,
etc) - some of which we haven't really talked about.

Does that sound reasonable? If so I am happy to try and summarise to
the list.

Jamie

(*I think this use of realization in this context is different from the
current use of realization in CF - that we have just added - so we'll
need a different name/label.)

(When suggesting having pointers to data sets for initial conditions I
meant simply the assimilation/model start conditions - Paco is right it
would be a nightmare to keep track of the all obs that go into the
assimilation. That said not all assimilation/model start conditions may
be publically available so this may not work? I have reservations about
the usefulness of simply a numerical id to indicate the initial
condition ensemble number. I'd rather try and give it some more
meaningful content).

----------------------------------------------------------------------------
Jonathan 25th October, Email 8

Dear Jamie

Thank you. That is a good summary. I don't like title either, by the way.
You could post it to the email list, I think, perhaps with the earlier
ones appended for reference!

Cheers

Jonathan

----------------------------------------------------------------------------
Paco 25th of October, Email 9

Dear Jonathan and Jamie,

I agree that allowing the relevant global attributes to be
standard_names as well might help a lot.

As for the use of title, source and institution, see below.

>> the use of a variable
>> similar to "institution" in a multi-forecast system prediction may not
>> be enough to discriminate the different situations that could happen.
>> Let's imagine a forecast produced with the dynamical model x at its own
>> institution and at institution y. Those two forecasts, even if valid at
>> the same verification time, might not be the same, so that we need
>> another variable to indicate where the data are distributed from.
>> Therefore, the definition of "institution" below might not be precise
>> enough. This situation is already happening and will be more frequent in
>> the future as prediction models developed by one institution are run at
>> a different one.
>
> It seems to me that "institution" is the correct description of the place
> which produces the data, precisely like the global attribute. The indication
> that it was done with model x doesn't seem to me to be part of the description
> of the institution, but part of the description of the forecast system used.

The problem I see with this use of "institution" or "institute" is how
to distinguish the forecasts run by centre x at institution y. While the
model (system and version) could be specified in "source", the
institution x is not the same as institution y. This sort of situation
happens in the context of the multi-model system developed by APCC
(http://www2.apcc21.net/index.php). KMA (Korea) produces seasonal
forecasts with the NCAR (USA) model at APCC where the data are archived.
In this case, while "institution" could take the value KMA and the
information about the version of the NCAR model could go into "source",
we still need to indicate that the data have been produced,
postprocessed and stored by APCC, which is the centre responsible. A
variable such as "forecast_producer" to indicate the centre where the
model was run could clarify the situation. In a global seasonal
multi-model context, the same NetCDF file might contain other forecasts
valid at the same verification time produced by other centres at, for
instance, ECMWF, which would prevent the variable "forecast_producer" to
be a global attribute. This is the sort of complex case that the working
group on seasonal-to-interannual prediction (WGSIP) is trying to
consider and why we initially proposed the variable "original_distributor".

>> As for "source", it could cater for the variables I called
>> "forecast_system_version_number" and for "forecast_method_number", if
>> both have to be included in the same string. I suppose that the model
>> description needs to be quite detailed, which will require long strings
>> for the source variable.
>
> As suggested by the definition of the source attribute, this could name the
> model e.g. GloSea (your forecast_system_version_number, but strings give
> self-describing info, unlike numbers) and its variant e.g. in a perturbed
> parameter ensemble, or if the model has subversions (your
> forecast_method_number). It could also include the name of the centre which
> produced the model e.g. "Met Office GloSea" or "NCAR CCSM 3.0". I would guess
> that in general it is sufficient to inpsect a long string. Possibly if you had
> to group the members of a large perturbed physics ensemble together, separate
> from other models, it would be useful to have a separate identifier for them
> e.g. model_perturbation_id.

This use of "source" and the definition given by Jamie might do the job,
except for the inclusion of the centre. There is an increasing number of
cases where a model developed by a centre is run by another one. I
believe this information should be kept separate (see comments above
concerning "institution").

I don't see the need to have a separate identifier for the model
perturbation. That would be something similar to the variable
"forecast_system_method" that we agreed to include in source. As I
understand it, source might also contain URL pointers.

> That leaves realization, for members of an ensemble using the same model but
> different input data, and experiment_id, for describing the intent of the
> experiment. This would have been a good place to store scenario information
> for IPCC experiments, for example. What do you think, Jamie?
>
> I would propose that if we allow realization and experiment_id as global
> attributes too, for symmetry. Do source and institution need to be defined
> as standard names so they can be used for coordinates? They are already
> permitted as attributes of data variables.

Does any of you have a guess of what the dimensions of "realization" and
"experiment_id" would be for a multi-model ensemble forecast built up
with two models with 5 and 11 ensemble members?

If "realization" is not to be used to identify ensemble members of an
initial condition ensemble forecasts, then I suggest we reconsider the
variable "ensemble_member_number".

I'm still reluctant to give a use to title in a forecasting context.
Honestly, I don't see what concept it could correspond to.

To end up with this long message, I feel sorry that there are not more
members of the medium-range/seasonal operational forecast community
following the discussion. I wouldn't like to miss any of the relevant
issues.

Cheers,
Paco
----------------------------------------------------------------------------
Paco 25th of October, Email 10

Yes, I'm happy with this way forward (sending everything to the list)
and with the closer link between the needs of coding data of different
time scales and operational uses.

Thanks a lot for the comments.

Cheers,
Paco
----------------------------------------------------------------------------
Jonathan 25th of October, Email 11

Dear Paco

> The problem I see with this use of "institution" or "institute" is how
> to distinguish the forecasts run by centre x at institution y. While the
> model (system and version) could be specified in "source", the
> institution x is not the same as institution y. This sort of situation
> happens in the context of the multi-model system developed by APCC
> (http://www2.apcc21.net/index.php). KMA (Korea) produces seasonal
> forecasts with the NCAR (USA) model at APCC where the data are archived.
> In this case, while "institution" could take the value KMA and the
> information about the version of the NCAR model could go into "source",
> we still need to indicate that the data have been produced,
> postprocessed and stored by APCC, which is the centre responsible.

Ah. So there are *three* centres involved - not just x and y, but z as well.
But you say "KMA produces" and "data have been produced ... by APCC". So I
am confused. Which of them has produced the data? These three roles are not
clear to me. I agree that NCAR belongs in the source.

Between global attribute and coordinate variable, there is also the
possibility of data variable attribute. These attribute are already allowed
on data variables. It depends whether you "feel" they are coordinates or
attributes.

> To end up with this long message, I feel sorry that there are not more
> members of the medium-range/seasonal operational forecast community
> following the discussion. I wouldn't like to miss any of the relevant
> issues.

Yes. We should get back into public view!

Cheers

Jonathan
----------------------------------------------------------------------------
Paco 25th of October, Email 12

Dear Jonathan,

>> The problem I see with this use of "institution" or "institute" is how
>> to distinguish the forecasts run by centre x at institution y. While the
>> model (system and version) could be specified in "source", the
>> institution x is not the same as institution y. This sort of situation
>> happens in the context of the multi-model system developed by APCC
>> (http://www2.apcc21.net/index.php). KMA (Korea) produces seasonal
>> forecasts with the NCAR (USA) model at APCC where the data are archived.
>> In this case, while "institution" could take the value KMA and the
>> information about the version of the NCAR model could go into "source",
>> we still need to indicate that the data have been produced,
>> postprocessed and stored by APCC, which is the centre responsible.
>
> Ah. So there are *three* centres involved - not just x and y, but z as well.
> But you say "KMA produces" and "data have been produced ... by APCC". So I
> am confused. Which of them has produced the data? These three roles are not
> clear to me. I agree that NCAR belongs in the source.

Sorry for not being clear enough. There may be 3 centres involved: the
centre developing the forecast system, the centre running the forecast
and the centre where the forecast is run at. In my example, NCAR is the
model developer (it goes into source), KMA is the institution that
develops the forecast system (it modifies the model to convert it into a
climate forecast model) and APCC is the institution where the forecasts
are run and archived. Another example can be provided from the EUROSIP
operational multi-model ensemble, wherein GloSea is the forecast system
(source), Met Office is the forecast system developer (institution) and
ECMWF is where the model is run at (forecast_producer).

> Between global attribute and coordinate variable, there is also the
> possibility of data variable attribute. These attribute are already allowed
> on data variables. It depends whether you "feel" they are coordinates or
> attributes.

Sorry, I miss the differences. I looked at the CF conventions
documentation and couldn't find the data variable attribute definition.
However, it is desirable that the variables we've been discussing are
not global for this sort of files.

Cheers,
Paco
----------------------------------------------------------------------------
Jonathan 25th of October, Email 13 of 13

Just to clarify, and then silence until Jamie has summarised:

> I looked at the CF conventions
> documentation and couldn't find the data variable attribute definition.

In 2.6.2

The NUG defines title and history to be global attributes.
We wish to allow the newly defined attributes, i.e., institution, source,
 references, and comment, to be either global or assigned to individual
variables. When an attribute appears both globally and as a variable attribute,
the variable's version has precedence.

Jonathan
----------------------------------------------------------
Received on Thu Oct 26 2006 - 09:53:53 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒