[CF-metadata] original_ensemble_size from Karl Taylor on 2015-07-24 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Karl Taylor <taylor13>
Date: Thu, 23 Jul 2015 17:42:03 -0700

Hi all,

This addresses the issue of how to associate an ensemble size with a
variable. It also suggests an alternate way of proceeding that is more
general and will allow us to record, for example, which models were
included in a multi-model mean.

First to consider Jim's suggestion:
I agree with Jim that you might want to indicate which member (or
members) of an ensemble were represented by the variable so you might
want to include a coordinate variable of "realization". You could then
also define an *attribute* of that coordinate as "ensemble_size" which
would record the size, but currently that approach is not standardized
(but of course is permitted) by our conventions.

Now Mark's suggestion:
Mark's alternative approach to make "ensemble_size" a coordinate
variable (presumably in addition to possibly including "realization")
would also relate it to the variable of interest, but this would be a
bit unconventional since a variable would normally be considered to be a
*function* of its (independent) coordinates. I don't think
T(x,realization,ensemble_size) is a proper function, since T depends on
x and realization, but should be independent of ensemble size in most cases.

Jonathan's suggestion:
I think Jonathan suggested including ensemble_size in a cell_methods
attribute. For example

dimensions:
     lon=72
     lat=96
     e_size=5

variables:
     float precip(lon,lat)
         precip: cell_methods="realization: point (sample_size: e_size)

where because "realization" is a standard name, it does not need to be
explicitly declared with a "coordinates" attribute. Jonathan originally
used "dimension" rather than "sample_size", but I prefer
"sample_size". If this approach were followed, then CF would need to
be modified so that "sample_size" (along with "interval") was designated
to be one of the options for providing "standardized" extra information
in the cell_methods attribute. Note that the variable "pointed to" by
original_domain would not necessarily be a coordinate variable; it need
not be monotonic and it could be a character variable (i.e., a list).

Alternative "new approach"

An approach that is a slight variant on Jonathan's and would allow even
more information to be provided concerning the ensemble is illustrated
by the following example:

dimensions:
     lon=72
     lat=96
     members=5

variables:
     float precip(lon,lat)
         precip: cell_methods="member: point (sample_pool: members)
     int member
         member: standard_name="realization"
      int members(members)
         members: standard_name="realization"

data:
     member = 3
     members = 1, 3, 5, 6, 10

This would tell you T was from the realization labeled 3 of a 5-member
ensemble (with labels 1, 3, 5, 6, and 10). If this approach were
adopted, then CF would need to be modified so that "sample_pool" (along
with "interval") was designated to be one of the the options for
providing "standardized" extra information in the cell_methods attribute.

Under Jonathan's approach and also the "new approach", there wouldn't be
a need to define the standard_name "ensemble_size" because that would be
provided by the dimension size (5 in the above).

Note that the new approach could also be used to record a multi-model
ensemble mean (I'm not absolutely sure this example complies with the
current convention, but I think it would if the option to designate the
"original_domain" were added to CF):

dimensions:
     lon=72
     lat=96
     models=5
     max_len = 10

variables:
     float precip(lon,lat)
         precip: cell_methods="realization: mean (sample_pool: models)
      char models(models, max_len)

data:
     models = "CanESM2", "CESM1", "CNRM-CM5", "HadGEM2", "MIROC-ESM"

Note also that the flexibility of this new approach could be useful for
dimensions other than realization when, for example, the sampling
interval for a spatial mean is from scattered stations. If one were
computing an spatial mean from 5 stations, for example, this could be
recorded as follows:

dimensions:
     stations=5
     max_len=16

variables:
     float precmean
         precmean: cell_methods="area: mean (sample_pool: stations)"
     char stations(stations,max_len)
         stations: coordinates="lat lon"
     lat(stations)
         lat: standard_name="latitude"
     lon(stations)
         lon: standard_name="longitude"

data:
     stations = "Oakland", "San Francisco", "Livermore", "San Jose",
"Palo Alto"
     lat = 37.62, 37.77, ...
     lon = -122.27, -122.42, ....

I would find it very nice to be able to specify the models contributing
to a multi-model mean using the above approach. Anyone else think so?
It would also satisfy Mark's use case of wanting to record the size of
the ensemble.

Best regards,
Karl

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20150723/c6afd02f/attachment.html>
Received on Thu Jul 23 2015 - 18:42:03 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST