Hi all,
This addresses the issue of how to associate an ensemble size with a
variable. It also suggests an alternate way of proceeding that is more
general and will allow us to record, for example, which models were
included in a multi-model mean.
First to consider Jim's suggestion:
I agree with Jim that you might want to indicate which member (or
members) of an ensemble were represented by the variable so you might
want to include a coordinate variable of "realization". You could then
also define an *attribute* of that coordinate as "ensemble_size" which
would record the size, but currently that approach is not standardized
(but of course is permitted) by our conventions.
Now Mark's suggestion:
Mark's alternative approach to make "ensemble_size" a coordinate
variable (presumably in addition to possibly including "realization")
would also relate it to the variable of interest, but this would be a
bit unconventional since a variable would normally be considered to be a
*function* of its (independent) coordinates. I don't think
T(x,realization,ensemble_size) is a proper function, since T depends on
x and realization, but should be independent of ensemble size in most cases.
Jonathan's suggestion:
I think Jonathan suggested including ensemble_size in a cell_methods
attribute. For example
dimensions:
lon=72
lat=96
e_size=5
variables:
float precip(lon,lat)
precip: cell_methods="realization: point (sample_size: e_size)
where because "realization" is a standard name, it does not need to be
explicitly declared with a "coordinates" attribute. Jonathan originally
used "dimension" rather than "sample_size", but I prefer
"sample_size". If this approach were followed, then CF would need to
be modified so that "sample_size" (along with "interval") was designated
to be one of the options for providing "standardized" extra information
in the cell_methods attribute. Note that the variable "pointed to" by
original_domain would not necessarily be a coordinate variable; it need
not be monotonic and it could be a character variable (i.e., a list).
Alternative "new approach"
An approach that is a slight variant on Jonathan's and would allow even
more information to be provided concerning the ensemble is illustrated
by the following example:
dimensions:
lon=72
lat=96
members=5
variables:
float precip(lon,lat)
precip: cell_methods="member: point (sample_pool: members)
int member
member: standard_name="realization"
int members(members)
members: standard_name="realization"
data:
member = 3
members = 1, 3, 5, 6, 10
This would tell you T was from the realization labeled 3 of a 5-member
ensemble (with labels 1, 3, 5, 6, and 10). If this approach were
adopted, then CF would need to be modified so that "sample_pool" (along
with "interval") was designated to be one of the the options for
providing "standardized" extra information in the cell_methods attribute.
Under Jonathan's approach and also the "new approach", there wouldn't be
a need to define the standard_name "ensemble_size" because that would be
provided by the dimension size (5 in the above).
Note that the new approach could also be used to record a multi-model
ensemble mean (I'm not absolutely sure this example complies with the
current convention, but I think it would if the option to designate the
"original_domain" were added to CF):
dimensions:
lon=72
lat=96
models=5
max_len = 10
variables:
float precip(lon,lat)
precip: cell_methods="realization: mean (sample_pool: models)
char models(models, max_len)
data:
models = "CanESM2", "CESM1", "CNRM-CM5", "HadGEM2", "MIROC-ESM"
Note also that the flexibility of this new approach could be useful for
dimensions other than realization when, for example, the sampling
interval for a spatial mean is from scattered stations. If one were
computing an spatial mean from 5 stations, for example, this could be
recorded as follows:
dimensions:
stations=5
max_len=16
variables:
float precmean
precmean: cell_methods="area: mean (sample_pool: stations)"
char stations(stations,max_len)
stations: coordinates="lat lon"
lat(stations)
lat: standard_name="latitude"
lon(stations)
lon: standard_name="longitude"
data:
stations = "Oakland", "San Francisco", "Livermore", "San Jose",
"Palo Alto"
lat = 37.62, 37.77, ...
lon = -122.27, -122.42, ....
I would find it very nice to be able to specify the models contributing
to a multi-model mean using the above approach. Anyone else think so?
It would also satisfy Mark's use case of wanting to record the size of
the ensemble.
Best regards,
Karl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20150723/c6afd02f/attachment.html>
Received on Thu Jul 23 2015 - 18:42:03 BST