⇐ ⇒

[CF-metadata] CF-1.6 Conformance Requirements/Recommendations

From: Jonathan Gregory <j.m.gregory>
Date: Fri, 23 Mar 2012 18:55:35 +0000

Dear Ros

Thanks a lot for working on this. I think you have correctly identified parts
which are stated as requirements and recommendations, but you have to put
yourself in the shoes of the CF-checker, and consider what you can actually
*do* to make the checks. The CF-checker, for instance, does not know what we
mean by element and instance dimensions, in the first statement:

"In the multidimensional array representations, data variables have both an
instance dimension and an element dimension. The dimensions may be given in
any order. If there is a need for either the instance or an element dimension
to be the netCDF unlimited dimension (so that more features or more elements
can be appended), then that dimension must be the outer dimension of the data
variable i.e. the leading dimension in CDL."

To check this statement, I think we have first to refer to the table in 9.1,
which implies a lot of checks on dimensions and coordinates. For instance, if
the featureType att says it's a timeseries, the table says that the data is
logically 1D (i), and there are mandatory coord or aux coord vars with the
logical structure x(i) y(i) t(i,o). There are five possibilities for storing a
collection of timeseries features. These are alternative sets of requirements
for the CF-checker, and one of these sets must be satisfied:

* Single timeseries: The data variable is 1D (the element dimension). It has a
coord var or a 1D aux coord var of time. It has two scalar coord vars of
horizontal position.

* Orthogonal multidimensional representation: The data variable is 2D. One of
its dimensions (the element dimension) has a coordinate variable or a 1D
auxiliary coordinate variable of time. The other one (the instance dimension)
has two coordinate or 1D aux coord variables of horizontal position.

* Incomplete multidimensional rep: The data variable is 2D. It has a 2D aux
coord var of time with the same dimensions as itself. One of the dimensions
has two coordinate or 1D aux coord variables of horizontal position.

* Contiguous ragged array rep: The data variable is 1D. It has a 1D aux coord
var of time with this dimension. There is a variable in the file (the count
variable) with a sample_dimension att that names the dimension of the data
variable. The data variable has two 1D aux coord vars of horizontal position,
whose dimension is the dimension of the count variable.

* Indexed ragged array rep: The data variable is 1D. It has a 1D aux coord var
of time with this dimension. There is an variable in the file (the index
variable) with the same dimension as the data variable and which has an
index_dimension att. The data variable has two 1D aux coord vars of
horizontal position with the dimension is named by the instance_dimension att
of the index variable.

This is a bottom-up approach, but that's what the checker has to do, isn't it.
When you find that one of these cases matches what you have, it allows you
formally to identify the instance and element dimensions, and the count and
index variables if relevant. Do you see what I mean? It would be necessary to
work through the other cases in a similar way, but it would take a lot of
space to write them all down in the conformance rules in the way I have done
above. Perhaps there would be a way to summarise the principles on which the
checker would work from the table.

We did not say exactly what constitutes a horizontal coordinate. I propose
that if two horizontal coord or aux coord vars are required, they should be
longitude and latitude, or grid_longitude and grid_latitude, or
projection_x_coordinate and projection_y_coordinate, or *any* pair of
coordinates if one of them has axis='X' and the other axis='Y'. It is allowed
to provide more than one of these pairs, but it is not allowed (for instance)
to supply only longitude and grid_latitude, which don't form a pair.

(It is always OK to supply coordinates in addition to those which are
mandatory. These can just be ignored by the checker.)

Once these identifications have been made, other checks can be applied:

H.2 It is recommended that there is be a variable with cf_role of
timeseries_id. If there is such a variable, it must have the instance
dimension. All the values of this variable must be different. (These are for
timeseries. Corresponding but different rules would apply for profiles and
trajectories. Appendix H suggests others, such as it is recommended that there
should be station variables with standard_name attributes "platform_name",
"surface_altitude" and "platform_id" when applicable, from H.5.)

9.3 A count variable or an index variable must be integer.

9.3 Negative values (except missing data) are not allowed in a count
variable. The sum of the non-missing values must not exceed the dimension of
the data variable.

9.3 Negative values (except missing data) are not allowed in an index
variable. None of the values may be greater than or equal to the dimension of
the data variable (because they must be valid indices). All of the non-missing
values must be different.

You say that the featureType att is required. This is a tricky one. We don't
know it's required unless we know we have a discrete sampling geometry that
requires it, but we don't know it's a discrete sampling geometry for sure
unless there is a featureType. I suggest you look for the featureType first,
and only apply the checks for sect 9 if it is present, except that you could
give an error if there is no featureType and there is a count or index var in
the file. That must indicate a ragged rep, and they require featureType.

9.6 Where any auxiliary coordinate variable contains a missing value, all
other coordinate, auxiliary coordinate and data values corresponding to that
element should also contain missing values.

9.6 Where the instance variable identified by cf_role contains a missing value
indicator, all other instance variable should also contain missing values
corresponding to that element.

Best wishes

Jonathan
Received on Fri Mar 23 2012 - 12:55:35 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒