[CF-metadata] Encoding Errors on variables in CF from Jonathan Gregory on 2003-04-06 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Jonathan Gregory <jonathan.gregory>
Date: Sun, 6 Apr 2003 09:54:31 +0100

Dear All

I'd like to apologise for absence in the coming week, when I will be at the
EGS in Nice.

I don't object to pointers per se. I agree they are a convenient way of finding
the data you need. But I would be unhappy if they were the sole way of
identifying an error variable because, as Bryan said, "the variables and their
error variables will inevitably get seperated in processing of these netcdf
files." When you are manipulating variables in memory, they are especially
likely to be independent objects, not identifiable by links to other objects.
So I think each variable has to be fully self-describing. It is not sufficient
for it to be labelled just as a standard error; it has to say of what quantity
it is a standard error, I think.

If we include data quality variables, we are dealing with more than just
"errors". I would suggest calling the pointer attribute "ancillary_data"
rather than "error_variables".

Ag and Bryan both point out that there may be a rather large number of
different kinds of such data. These kinds can be described by the comment, I
agree. However, if we want to exchange data efficiently, we need standardised
identifications for them, so programs can easily tell how to interpret them.
That is why I proposed the standardised intent attribute. Ag is concerned
that requiring the standard_name and intent to be processed "together" could
cause problems because it then uses the same standard_name for the variable
and the associated error variable; therefore he favours the more simplistic
approach of adding a prefix to the standard name.

Perhaps we could address these concerns by using a very general prefix to
the standard name prefix of "ancillary_data_for" (for example) - the same
standard name for any of these kinds of ancillary data (standard error,
detection limit, data quality, etc.) *and* use the intent variable to specify
in a standardised way what kind of data it is? This means we avoid multiplying
the size of the standard_name table by a large factor. In the worst case, we
double its size.

Brian points out that without links you can't associate the right error var
with its parent var using just the standard name and intent. This is true, but
also you can't distinguish the parent vars anyway by standardised metadata! In
Brian's example, they are distinguished only by the long_name. This might in
itself be a problem for data exchange. If it was solved by standardising some
distinguishing metadata, the error vars would also be distinguishable in the
same way.

About separating data into files, I will start a separate thread.

Summary:

* Point from a variable to its associated ancillary data variables (error
variables, data quality variables and others not yet thought of) through a
blank-separated list of variable names in a ancillary_data attribute. Be
aware this link might get broken.

* Give the ancillary variables a standard_name prefix of "ancillary_data_for_".
They should also have all the metadata of their parent variables, so they are
independently fully self-described.

* Give them also an intent attribute with a standardised value to define their
kind of ancillary data, and other attributes such as error_multiplier if
required to be precise.

* Use attributes flag_values and flag_meanings to provide interpretations of
ancillary variables containing flag data.

Best wishes

Jonathan
Received on Sun Apr 06 2003 - 01:54:31 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST