[CF-metadata] CF-1.6 Conformance Requirements/Recommendations from Jim Biard on 2012-03-23 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Jim Biard <jim.biard>
Date: Fri, 23 Mar 2012 15:59:01 -0400

Hi.

Jonathan's reply contained the section:

9.6 Where any auxiliary coordinate variable contains a missing value, all
other coordinate, auxiliary coordinate and data values corresponding to that
element should also contain missing values.

I thought I understood that missing values were forbidden for true
coordinate variables. Has this changed? This requirement seems wrong to
me anyway. If the values in the data and auxiliary coordinate variables
come from an external data source, it is entirely possible that you could
have a measurement missing from one without it being missing from the
other. Why force a missing value into the data when you might, in fact,
have a valid value?

Grace and peace,

Jim

On Fri, Mar 23, 2012 at 2:55 PM, Jonathan Gregory <j.m.gregory at reading.ac.uk
> wrote:

> Dear Ros
>
> Thanks a lot for working on this. I think you have correctly identified
> parts
> which are stated as requirements and recommendations, but you have to put
> yourself in the shoes of the CF-checker, and consider what you can actually
> *do* to make the checks. The CF-checker, for instance, does not know what
> we
> mean by element and instance dimensions, in the first statement:
>
> "In the multidimensional array representations, data variables have both an
> instance dimension and an element dimension. The dimensions may be given
> in
> any order. If there is a need for either the instance or an element
> dimension
> to be the netCDF unlimited dimension (so that more features or more
> elements
> can be appended), then that dimension must be the outer dimension of the
> data
> variable i.e. the leading dimension in CDL."
>
> To check this statement, I think we have first to refer to the table in
> 9.1,
> which implies a lot of checks on dimensions and coordinates. For instance,
> if
> the featureType att says it's a timeseries, the table says that the data is
> logically 1D (i), and there are mandatory coord or aux coord vars with the
> logical structure x(i) y(i) t(i,o). There are five possibilities for
> storing a
> collection of timeseries features. These are alternative sets of
> requirements
> for the CF-checker, and one of these sets must be satisfied:
>
> * Single timeseries: The data variable is 1D (the element dimension). It
> has a
> coord var or a 1D aux coord var of time. It has two scalar coord vars of
> horizontal position.
>
> * Orthogonal multidimensional representation: The data variable is 2D. One
> of
> its dimensions (the element dimension) has a coordinate variable or a 1D
> auxiliary coordinate variable of time. The other one (the instance
> dimension)
> has two coordinate or 1D aux coord variables of horizontal position.
>
> * Incomplete multidimensional rep: The data variable is 2D. It has a 2D aux
> coord var of time with the same dimensions as itself. One of the dimensions
> has two coordinate or 1D aux coord variables of horizontal position.
>
> * Contiguous ragged array rep: The data variable is 1D. It has a 1D aux
> coord
> var of time with this dimension. There is a variable in the file (the count
> variable) with a sample_dimension att that names the dimension of the data
> variable. The data variable has two 1D aux coord vars of horizontal
> position,
> whose dimension is the dimension of the count variable.
>
> * Indexed ragged array rep: The data variable is 1D. It has a 1D aux coord
> var
> of time with this dimension. There is an variable in the file (the index
> variable) with the same dimension as the data variable and which has an
> index_dimension att. The data variable has two 1D aux coord vars of
> horizontal position with the dimension is named by the instance_dimension
> att
> of the index variable.
>
> This is a bottom-up approach, but that's what the checker has to do, isn't
> it.
> When you find that one of these cases matches what you have, it allows you
> formally to identify the instance and element dimensions, and the count and
> index variables if relevant. Do you see what I mean? It would be necessary
> to
> work through the other cases in a similar way, but it would take a lot of
> space to write them all down in the conformance rules in the way I have
> done
> above. Perhaps there would be a way to summarise the principles on which
> the
> checker would work from the table.
>
> We did not say exactly what constitutes a horizontal coordinate. I propose
> that if two horizontal coord or aux coord vars are required, they should be
> longitude and latitude, or grid_longitude and grid_latitude, or
> projection_x_coordinate and projection_y_coordinate, or *any* pair of
> coordinates if one of them has axis='X' and the other axis='Y'. It is
> allowed
> to provide more than one of these pairs, but it is not allowed (for
> instance)
> to supply only longitude and grid_latitude, which don't form a pair.
>
> (It is always OK to supply coordinates in addition to those which are
> mandatory. These can just be ignored by the checker.)
>
> Once these identifications have been made, other checks can be applied:
>
> H.2 It is recommended that there is be a variable with cf_role of
> timeseries_id. If there is such a variable, it must have the instance
> dimension. All the values of this variable must be different. (These are
> for
> timeseries. Corresponding but different rules would apply for profiles and
> trajectories. Appendix H suggests others, such as it is recommended that
> there
> should be station variables with standard_name attributes "platform_name",
> "surface_altitude" and "platform_id" when applicable, from H.5.)
>
> 9.3 A count variable or an index variable must be integer.
>
> 9.3 Negative values (except missing data) are not allowed in a count
> variable. The sum of the non-missing values must not exceed the dimension
> of
> the data variable.
>
> 9.3 Negative values (except missing data) are not allowed in an index
> variable. None of the values may be greater than or equal to the dimension
> of
> the data variable (because they must be valid indices). All of the
> non-missing
> values must be different.
>
> You say that the featureType att is required. This is a tricky one. We
> don't
> know it's required unless we know we have a discrete sampling geometry that
> requires it, but we don't know it's a discrete sampling geometry for sure
> unless there is a featureType. I suggest you look for the featureType
> first,
> and only apply the checks for sect 9 if it is present, except that you
> could
> give an error if there is no featureType and there is a count or index var
> in
> the file. That must indicate a ragged rep, and they require featureType.
>
> 9.6 Where any auxiliary coordinate variable contains a missing value, all
> other coordinate, auxiliary coordinate and data values corresponding to
> that
> element should also contain missing values.
>
> 9.6 Where the instance variable identified by cf_role contains a missing
> value
> indicator, all other instance variable should also contain missing values
> corresponding to that element.
>
> Best wishes
>
> Jonathan
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>

-- 
Jim Biard
Research Scholar
Cooperative Institute for Climate and Satellites
Remote Sensing and Applications Division
National Climatic Data Center
151 Patton Ave, Asheville, NC 28801-5001
jim.biard at noaa.gov
828-271-4900
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20120323/e464c65a/attachment.html>

Received on Fri Mar 23 2012 - 13:59:01 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST