Dear all
I'd like to make a proposal about an issue we discussed two years ago and
didn't really settle. I think it needs a resolution because what we currently
have is ambiguous, and this will create problems of interpretation for data
that is being produced now. (To get round this for the IPCC AR4 database, Karl
Taylor supplemented CF with explicit definitions of how the calculations
should be done.) For history, see
http://www.cgd.ucar.edu/pipermail/cf-metadata/2005/000536.html and subsequent
postings. My proposal here isn't quite what I suggested then. It has to be
said that this is a complicated business. However, the complexity comes from
real practical cases, so cannot be avoided! But naturally it would be most
welcome if simpler ways can be proposed to deal with it.
Requirement: To indicate the portion of a cell over which a statistic has been
calculated, in situations where there is a need to distinguish between
statistics calculated for the same quantity over different portions of the
cell, or where the quantity might be considered to be undefined over some of
the cell.
Examples: The statistic is usually a mean or a sum. The situation most often
arises with cells in the horizontal, where the "portions" are different types
of surface that don't have defined geographical boundaries. Some examples:
(a) sea_ice_thickness averaged over the area of sea ice, the area of sea, or
the entire area of the cell including land. These can all be written as A/B,
where A is the total volume of sea ice in the cell, and B is the area of sea
ice, the area of sea or the area of the cell. Over the ice-free parts of the
sea, one could say that the sea-ice thickness is zero; over the land, it is
not really meaningful.
(b) surface_upward_sensible_heat_flux averaged over different surface types
within the cell e.g. land, sea, land_ice, forest. These might similarly be
written as A/B, A (in W) being the area-integral of the flux applying to the
given surface type, and B (in m2) either the area occupied by that type, or
the cell area. Alternatively, this means the flux (W m-2) is expressed either
per unit area of the particular surface type, or per unit area of the grid
cell. When the values for different types are given per unit area of the cell,
the sum of these values over all types is the mean for the cell as a whole.
Unlike sea-ice thickness, sensible heat flux exists everywhere, but takes
different values in different portions of the cell.
(c) surface_temperature averaged over different surface types within the cell.
This is only likely to be given as a average value for each surface type i.e.
formally where B is the area of that type e.g. the temperature is 300 K over
the land portion of the cell. Supposing the land is one-third of the cell,
you could say the land temperature contributes 100 K to the temperature
averaged over the cell (like you might for sensible heat flux) but I've not
come across such a statistic.
(d) surface_snow_mass (kg - this is not currently a standard name) in the
cell, or on the land only within the cell, or on the sea-ice. This is a sum,
rather than a mean.
(e) surface_downward_heat_flux_in_air calculated for sea areas and averaged
over the world. This can be expressed per unit area of the sea or per unit
area of the world. Both are useful in different contexts; the former is more
natural when considering the heat budget of the ocean, the latter if the heat
budget of the planet.
In the standard_name guidelines, this issue is partly addressed by using
where-phrases. However, this approach is unclear and inadequate. It can't
indicate, for instance, whether the sensible heat flux applying to the land
portion of the box is expressed per unit area of land or per unit area of the
cell. There are only 8 standard names at present with where-phrases, so I
don't think it's too late to think again.
This whole issue concerns the calculation of statistics for subgrid variation,
which should properly be indicated by coordinates and cell_methods, I think.
Taking that approach instead of where-phrases in standard names will also
reduce the number of standard names we need to add.
Proposal:
(1) If there is no cell_methods specified, the default interpretation for an
intensive quantity is "point", which means a local value in area or an
instantaneous value in time, and "sum" for an extensive quantity, meaning the
sum over area or time in the cell. No change is proposed to this, because it
is unproblematic, since point values and integrals do not involve dividing by
anything. It is undefined what value should be given if the quantity does not
exist e.g. sea_ice_thickness where there is no sea ice; the value could be
zero or missing.
(2) Delete the existing standard names with where-phrases, making them aliases
of names without the where-phrases.
(3) A cell_methods entry is generically of the form "NAME: [NAME: ...]
METHOD" (see CF 7.3), where NAMEs are the names of dimensions, scalar
coordinate variables, or standard_names. Horizontal area-means are indicated
by "lat: lon: mean", if "lat" and "lon" are the latitude and longitude
dimensions. I propose to introduce a special NAME of "area" to indicate
horizontal area, so an area-mean can be written "area: mean". Apart from
being more obvious in the usual lat-lon case, I suggest this is necessary in
cases where the horizontal coordinates are not lat and lon. As discussed in
another thread, there is no guaranteed way to identify which are the
horizontal dimensions in such cases, so a generic application might not be
able to identify the cell_methods entry applying to horizontal area.
(4) To indicate that a quantity should be evaluated over a particular portion
of a cell, because it could have different values for different portions of
the cell, one of two conventions may be used:
(i) a string-valued coordinate variable or scalar coordinate variable
indicating the portion of the cell. Variables with standard_names of
land_cover and surface_cover would be suitable. I propose also a new
standard_name of area_type, whose values could be any of the surface_cover
types as well as any distinctions of horizontal area which are not surface
types, such as "cloud". A coordinate variable with dimension greater than one
allows values of a quantity e.g. specific_humidity, surface_temperature,
surface_upward_sensible_heat_flux to be given for various area types in one
data variable, as is often needed in land surface models especially, since
they deal with many types. The cell_methods entry would be of the form "NAME:
METHOD" as usual, where NAME could be "area", given point (3).
(ii) a cell_methods entry may be given of the form "NAME: METHOD where TYPE"
where TYPE may be one of a small set of types explicitly listed in the CF
standard: I propose land, sea, sea_ice. The phrase "where TYPE" should be
interpreted as exactly equivalent to supplying a scalar or size-one coordinate
variable of area_type with value TYPE. It is proposed as a shorthand for the
commonest cases. Since separate models exist for land, sea and sea_ice, the
output from these models might have many or all variables requiring this
qualification; I think the shorthand would be an encouragement to modellers to
provide it, as well as a convenience to everyone else.
If neither of these conventions is used, the statistical METHOD specified by
cell_methods (mean, maximum, etc.) is assumed to apply to the entire cell.
With either of these conventions, the statistical METHOD applies to the
selected portion of the cell only.
In either case, other coordinate variables may also implicitly restrict the
portion of the cell considered by the statistical METHOD. For example, the
horizontal area of the ocean decreases with increasing depth. An area-mean as
a function of depth in the ocean is therefore formed over different areas at
different depths. If there is a coordinate of sea_ice_thickness which
specifies ranges, as would be done by a model that deals with thickness
categories, data variables with this coordinate would contain values
calculated within those categories, which implies a varying area as a function
of thickness.
(5) If the METHOD is "mean", the cell_methods entry may be further
supplemented by the phrase "over TYPE2", where TYPE2 can be land, sea or all.
This indicates that the mean is calculated by summing over the selected
portion of the cell, and then dividing by the area of TYPE2 in the cell, where
"all" means the entire area of the cell, rather than by the area of the
portion alone.
Examples:
(a) sea_ice_thickness averaged over the area of sea ice, the area of sea, or
the entire area of the cell including land. The cell_methods entry would be
"area: mean where sea_ice" in the first case, with "over sea" and "over all"
appended in the second and third.
(b) surface_upward_sensible_heat_flux averaged over different surface types
within the cell e.g. land, sea, land_ice, forest. This would have a coordinate
variable of e.g. surface_cover to distinguish the types. The cell_methods
entry would be "area: mean" if averaged over the type, or "area: mean over
all" if expressed by unit area of the grid cell.
(c) surface_temperature averaged over different surface types within the cell.
This would likewise have "area: mean".
(d) surface_snow_mass in the cell, or on the land only within the cell, or on
the sea-ice. The cell_methods entry would be "area: sum" in the first case,
with "where land" and "where sea-ice" in the second and third.
(e) surface_downward_heat_flux_in_air calculated for sea areas and averaged
over the world. The cell_methods would have "area: mean where sea" if the
result is expressed per unit area of sea, and "area: mean where sea over all"
if per unit area of the world. The bounds of the horizontal coordinates in
this case indicate that "area" means the whole world, or instead of "area:"
we could have "latitude: longitude:", which is a special syntax (in CF 7.3)
indicating the entire range of latitude and longitude.
(f) surface_snow_thickness averaged over the area of snow on land. No
where-phrase is proposed for this case. It would be done by including a scalar
coordinate variable of e.g. area_type, with the value snow_on_land, and the
cell_methods entry would be "area: mean", which would implicitly apply to the
area of snow_on_land. To indicate that the land-snow thickness should be
averaged over the grid cell area, "over all" would be appended. To indicate
the average thickness of snow on land, not just in the snow-covered parts of
the box, the area_type variable would be "land", or it could be omitted and
the cell_methods entry given as "area: mean where land".
Cheers
Jonathan
Received on Sat Dec 30 2006 - 11:26:57 GMT