⇐ ⇒

[CF-metadata] subgrid variation

From: Jonathan Gregory <j.m.gregory>
Date: Wed, 5 Jan 2005 11:46:26 +0000

Dear All

Happy New Year. This is quite a long message about a subject I have found
rather difficult, but I think it's important. Everyone's views will therefore
be most welcome.

Karl Taylor has pointed out some lack of clarity and inconsistency in the
standard name definitions regarding quantities which apply to only part of a
gridbox. In the standard name guidelines, we said

"Unless indicated, a quantity is assumed to apply to the whole area of each
horizontal grid box. The qualifier where_type specifies instead that the
quantity applies only to the part of the grid box of the named type. The types
are where_cloud, where_land, where_open_sea (i.e. sea free of sea ice, or
leads), where_sea (i.e. not land), where_sea_ice, where_vegetation."

Of course, further such types could be defined, but at present there are
actually only three standard_names with a where-qualifier. This statement in
the guidelines was intended to mean, for instance, that
* atmosphere_cloud_liquid_water_content is the gridbox average, not the
in-cloud average, which would be denoted by appending _where_cloud,
* sea_ice_area_fraction is the area of sea ice divided by the area of the box,
* sea_ice_thickness is gridbox average (i.e. sea ice volume in the box divided
by the area of the box), not the average over the part with sea ice,
* surface_snow_thickness is the average over the whole box, not the land
portion or the portion with snow.
However, if we apply the rule consistently, sea_ice_speed should include a
contribution from the ice-free part of the box, while ocean_mixed_layer_
thickness and sea_surface_temperature should have contributions from any land
area in the box. It is not obvious what these values should be - zero might be
OK in some cases, but in others doesn't make any sense. To avoid such
interpretations we would have to append _where_sea_ice and _where_sea, which
seems clumsy.

One way to get round this would be to define for every quantity a subgrid area
to which it applies by default. In some cases, such as ocean mixed layer depth
and SST, this might be obvious. In others, such as sea ice, runoff, cloud
liquid water or convective precipitation, there isn't an obvious choice, since
different choices suit different purposes. For example, sea ice thickness
averaged over sea ice is good for plotting, but sea ice thickness averaged
over the whole gridbox is easier to integrate to obtain the total sea ice
volume. If choices which aren't obvious have to be defined for each standard
name, I'd be worried that mistakes will arise if people make wrong guesses
about these choices instead of looking them up.

Since this distinction relates to subgrid variation, cell_methods should be
relevant. In 7.3 of the standard, we say that when cell_methods is not
specified, the default interpretation is "point" for intensive quantities.
Spatially this means the value at the gridpoint, not the mean over the
gridbox. This is also an appropriate interpretation for data variables defined
at scattered points in lat and lon rather than on a grid. I think the default
point interpretation does not raise a problem. If the value really claims to
apply to a point, it isn't so problematic whether zero or missing data is used
to indicate a "lack" of the quantity, as the value doesn't imply anything
about spatial variation. For instance, consider a time-series of sea ice
thickness and speed measured at a fixed point in the ocean. When there is no
ice, these quantities might be given either as zero or as missing data. I
expect that reporting practices vary.

The point interpretation may be reasonable for some gridded quantities, such
as atmospheric pressure, but for many quantities, such as precipitation, a
mean over some area is likely to be what data-writers and users have in mind.
When there is acknowledged subgrid variation in the surface type or the
process, making the quantity undefinable in some part of the gridbox, such as
for sea ice velocity and sea surface temperature over land, the point
interpretation cannot truly be applicable. I think the implication is that
including cell_methods should be strongly recommended. Without harming COARDS
compatibility, we could make cell_methods mandatory if a standard_name is
defined. Would that be OK?

The cell_methods attribute should indicate the area over which the mean has
been calculated. The current syntax is "lat: lon: mean" or "lon: lat: mean"
(they are equivalent) to indicate a mean over a lat-lon gridbox. I propose
that this should be taken as implying the entire gridbox area, and that we
should define new syntax for indicating means over parts of gridboxes. By
analogy with the treatment of climatological time-means, I propose that
spatial cell_methods entries should be able to have "over TYPE" appended,
where TYPE is one of the "where" types in standard names. Thus, for sea ice
thickness, "lat: lon: mean" means the average over the entire gridbox, with
zero for areas of no sea ice and of land, "lat: lon: mean over sea" the mean
over the sea area only with zero for no sea ice, and "lat: lon: mean over
sea_ice" the mean over the area with some sea ice. Where gridboxes contain
both land and sea, a point interpretation is impossible, so ocean mixed layer
thickness and sea surface temperature should have "lat: lon: mean over
sea". In-cloud liquid water content would have "lat: lon: mean over cloud"
while "lat: lon: mean" would indicate a mean with zero in cloud-free areas.

If this change is adopted, some existing data ought to have cell_methods
included to describe it properly, including some of the quantities being
collected by PCMDI for IPCC. However Karl and others have been very careful to
define in their instructions exactly what is wanted for that exercise, and
this can be relied on in the absence of metadata. I would say that this
proposed change is a recognition that we have not been sufficiently precise,
rather than a change in interpretation of existing metadata.

Analogous issues arise for time-variation. When a time-mean is calculated from
a time-series of sea ice thickness, should the times when there was no ice
contribute zero to the mean thickness, or should they be excluded from the
mean? Either procedure might be used and they need to be distinguished. The
"over TYPE" qualifiers should therefore be made available for temporal
cell_methods entries as well, preceding any "over" qualifiers indicating
climatological time-processing.

The "where" construction in standard names is still needed when a quantity is
applicable to various parts of the gridbox, but with different values. For
instance, in a mixed gridbox we might have values of both surface_upward_
latent_heat_flux_where_land and surface_upward_latent_heat_flux_where_sea
(these standard names aren't defined at present, but would conform to the
guidelines). The cell_methods should then have "lat: lon: mean over sea" or
"lat: lon: mean over land" to indicate the means over the areas to which they
apply, or "lat: lon: mean" if they have been multiplied by the sea and land
area fractions. This demonstrates that the area from which a quantity arises,
and the area over which it is averaged, aren't necessarily the same.

I would propose a further addition to the cell_methods syntax, that it should
optionally be permissible to have an entry for "area:" indicating the
combination of dimensions that define gridboxes covering the horizontal
area. For lat-lon gridboxes, "area:" would be equivalent to "lat: lon:" or
"lon: lat:" and would be more convenient. However the main reason to propose
it is the need to recognise horizontal area without having to understand how
the horizontal grid has been constructed. In the general case, the horizontal
dimensions might be projection x and y coordinates, or there might even be
more than two horizontal dimensions of the data variable for some of the more
complex grids being devised. A generic application cannot be expected to
recognise which combination of dimensions means horizontal area.

Summary of proposals:

(1) Retain the default "point" interpretation and clarify what it means.

(2) Make cell_methods mandatory if standard_name is supplied.

(3) Clarify that unqualified gridbox area-mean implies the entire gridbox.

(4) Introduce "over TYPE" qualifiers for spatial and temporal cell_methods.

(5) Introduce "area:" entries for spatial cell_methods.

Cheers

Jonathan
Received on Wed Jan 05 2005 - 04:46:26 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒