⇐ ⇒

[CF-metadata] standard names for variables in raw engineering units

From: John Graybeal <graybeal>
Date: Tue, 21 Apr 2009 17:02:01 -0700

I agree we have several kinds of things in play, and I came up with
some specific definitions, and an updated recommendation or two. I
apologize for the long email.

In the end, we may have to drop back to the big picture of CF, to
decide on the best way forward. (Which may even be, "No.") So a
general request is: After a suitable opportunity for general comment
on the latest emails, can the moderators weigh in with suggestions to
resolve each issue one way or the other, or to create a TRAC proposal
discussion? I'm OK with discussing any of this further on the list,
but my sense is we're not getting wide enough discussion to actually
move any of these topics toward definite resolution.


Here are the data types I see that aren't covered now:

  (1) fundamental geophysical properties that are correctly named by
an existing standard name (or could be), but have arbitrary non-
standard units (which may or may not be convertible with just a scale
and offset);

  (2) a fundamental measurement/value for a property, but one that is
made for an _instrument_ property rather than a geophysical property
('temperature_of_sensor_head'; also could be an instrument-impacted
property, like 'processed_sea_water_temperature');

  (3) a calculated value that is useful for diagnostics or other
purposes, but is neither (1) nor (2).

For (1), the motivation is finding additional science data that is
directly applicable. For (2) and (3), the motivation is
characterizing metadata for a measurement or instrument, ideally in a
way that recognizes its association with both the affected data and
the related instrument, and thereby supporting its discovery by those
who need to discover it. In this latter case (2+3), I suggest the
people who need to discover these parameters are analysts who are
already working with the data set, not someone trolling around the net
looking for sensor head or QC data.

So it may be appropriate to use two different mechanisms. Case (1) is
handled quite nicely by appending something to a standard name (by
rule, not by creating a duplicate set of standard names with
_variant_unit at the end). A bonus practice would be to agree on
attributes like 'unit_scale' and 'unit_offset' to be specified, but
all we really need at stage 1 is the discovery capability, automation
can follow.

I see 3 workable options for (1):
  A) Standard name modification by modifier rule (e.g., appending
_variant_units, or a standard name modifier), but the new standard
names are not listed, they are derivative
  B) Creating a separate vocabulary ("Standard Names II") that handles
all these cases, using CF Standard Names as a basis if needed (doesn't
have to be CF that does this, but approval is important)
  C) Dropping the requirement that units must be transformable to
canonical units by generally known conversions

If (A) is the preferred choice, I wish for modifiers that are
physically attached to the name (the existing <space> is not ideal),
along the lines of the following:
   -variant_unit
In any case, some solution for (1) is important for science discovery,
and using standard name with a modifier is valuable and self-
explanatory: temperature_of_sea_water-variant_unit


For cases (2) and (3), I am not sure there is a healthy way for this
vocabulary to handle them. Constructions like
'temperature_of_sensor_for_oxygen_in_sea_water' will likely prove
unhelpful in this vocabulary, for many reasons. First, you will
pollute a very nice set of standard names with a set of terms that
most users will find useless or in the way. Second, use cases will be
rare where people search across multiple data sets for narrow
technical qualifiers like 'temperature_of_sensor_...'. Third, the
construction of these names will be a quite slippery slope of
complexity (e.g., what if the sensor is measuring 3 things? what about
non-physical parameters like QC metrics?) Fourth, and finally, it
seems like the CF standard name team is already close to their limit
in handling current name requests and trac discussions; adding all
these names will require more people, faster processes, or something
more generic (possibly all 3).

However, the need for something to address cases (2) and (3) still
exists. Users may search within a data set or collection to see if
this kind of metadata exists (related to a measurement that is already
in that data set); and users *will* look at the standard name or
similar metadata to see what kind of thing this variable is. As a
reminder, one of the problems we are trying to address is making it
possible to describe most observational parameters with standard names.

In this context, how important is it to include detailed measurement
information as part of the standard name? That could all be included
in ancillary metadata, if everyone agrees on the protocol. Then could
the standard name just be something primitive/generic like
"auxiliary_information"? (Or, "sensor_property",
"calculated_value"....) This would clearly take some working out on
the trac page.one could do something like the following modifiers:
  -sensor_environment
  -component_{component_name}
  -computed_value_{value_name}
But to actually name the temperature of a component, we'd have to
allow 'sensor' as a measured entity: 'temperature_of_sensor' or
'temperature_of_sensor-component_dome'. (As the one existing CF
example does.) Any approach that merges fundamental measurements of
geophysical quantities into the name of sensor-related variables seems
likely to cause real problems.

So my suggestion for (2) and (3) is to consider a few carefully chosen
generic terms, that can be extended (by rule) by the users, and to try
to avoid any overlap between those technical values and the
geophysical parameters CF was originally designed to capture. Or
alternatively, allow a second set of terms that are strictly technical
in nature, and which can be used as standard names in CF files, but
whose moderation does not require manual review by CF, as long as the
rules are followed.

John



On Apr 21, 2009, at 11:40 AM, Nan Galbraith wrote:

> Thanks, Jonathan and Roy. It seems we have several kinds of
> 'problem variables' to deal with:
>
> - Components of geophysical variables (like voltage and temperatures
> from a radiometer, or the sensor temperature from an oxygen probe;
> these can be useful in troubleshooting or recalculating the
> geophysical
> variables),
>
> - QC kinds of parameters (like percent good, or error velocity, from
> an
> ADCP),
>
> - Raw instrument output that could be converted directly into
> geophysical data. I'm not sure if something like rain level is an
> example of this, or if the kinds of data that Roy mentioned are
> different in some way.
>
>> I think that if it's raw data which can be described more precisely
>> as being the output of a particular kind of sensor, and it is in
>> physical units, we should give it its own standard name; in such
>> cases, the raw data would have more of a standard meaning, and
>> standard algorithms could be applied to derive geophysical
>> quantities from it, I imagine.
>
> As it turns out, I think this is the case for my rain gauge example. A
> more careful search of the standard names using "any of"
> precipitation,
> rain, and evaporation turned up one that seems to work for the
> instrument's raw output (the precipitation level in the gauge):
>
>> lwe_thickness_of_precipitation_amount: The construction
>> lwe_thickness_of_X_amount or _content means the vertical
>> extent of a layer of liquid water having the same mass per unit area.
>
> I suspect that this name was not intended for raw rain gauge data,
> but that it would be alright to use it anyway.
>
>> A possible way to deal with raw data would be to regard it as a kind
>> of ancillary data and use a standard name modifier to indicate it (CF
>> 3.3 and appendix C) e.g. raw_data. In your case the standard_name
>> attribute would then contain "rainfall_rate raw_data". In Appendix C
>> we could specify that the units are 1 i.e. dimensionless if there is
>> a raw_data modifier.
>
> I think this, or something like it, would have been a good way to
> handle oxygen sensor temps, instead of assigning that the standard
> name, temperature_of_sensor_for_oxygen_in_sea_water.
>
> As Roy says, this would work only for variables that are in both raw
> and processed form. And, for some instruments, there are multiple
> components that should be carried along. We could use something
> like this for longwave radiation sensor components, but would need
> multiple modifiers, something like:
>
> surface_downwelling_longwave_flux_in_air
> ... raw_data_thermopile_voltage
> ... raw_data_dome_temperature
> ... raw_data_case_temperature
>
> Having sensor outputs from a radiometer "attached to" longwave
> radiation would be useful, especially if it can be done in a way that
> preserves the units of the temperatures and thermopile voltage.
>
> Maybe all these variables will need standard names after all. I'd
> like
> to know what anyone else thinks.
>
> Cheers - Nan
>
>
>> Thanks Jonathan,
>> Another gap in my CF knowledge exposed. My reaction to your posting
>> was based on the perspective of somebody who is going to have to
>> semantically link a file of data in CF with data in another format.
>> RDF is my weapon of choice, which requires the ability to reference
>> concepts in both datasets as URLs. Whilst %20 is possible for a
>> space, it's best avoided. So, at some stage I think we need to
>> revisit the syntax of modified Standard Names (hyphen as a
>> separator??).
>> However, this side track isn't helping closure of Nan's issue, which
>> I fully understand from data files that land on our doorstep. I
>> think your view is based on the assumption that the raw data have
>> corresponding processed data. If only! Frequently we get data from
>> a complex package of sensors from scientists who are only interested
>> in a subset. The rest are exactly as they came off the data logger.
>> However, even in this state they have value for certain applications
>> and so need labelling.
>> As far as the raw_data qualifier goes, the $64,000 dollar question
>> (to Nan I guess) is whether the labelling support required for raw
>> data needs to support quantitative as well as qualitative use cases.
>> If the answer is no, then the raw_data qualifier specifying values to
>> be dimensionless is acceptable. Otherwise, we'll need to set up a
>> distinct Standard Names for each raw channel variant. The more I
>> think about, the more I see this as a safer option.
>> Cheers, Roy.
>
>>
>>>>> Jonathan Gregory <j.m.gregory at reading.ac.uk> 04/13/09 8:28 AM >>>
>>>>>
>> Dear Roy
>>
>> The space is deliberate. The standard_name attribute consists of a
>> standard
>> name followed optionally by a modifier. We introduced this syntax
>> to allow us
>> to define ancillary data of various sorts e.g. a quality flag or a
>> standard
>> error, without requiring a new set of standard names. See CF
>> standard section
>> 3.3. Perhaps we should have put it in a different attribute; this
>> decision was
>> made years ago and I can't remember the discussion.
>>
>> I made the raw_data suggestion thinking that, in the case where you
>> have a
>> geophysical quantity, and you also want to save the raw data
>> (perhaps that is
>> the case Nan describes), the raw data could be regarded as ancillary
>> information (a bit like a standard error). This mechanism, with a
>> standard
>> name modifier and perhaps using ancillary_variables to point to it
>> (CF 3.4),
>> might be suitable then. The modifier implies the units of the
>> standard name
>> have been transformed in a certain way. We could therefore specify
>> them to be
>> dimensionless for raw data, as that is a special case of
>> transformation i.e.
>> replace units u with 1. They could be in the same units as the
>> geophysical
>> quantity; that would need a different standard name modifier, which
>> might be
>> appropriate for uncalibrated data. However they could not be in
>> different
>> units, not related to those of the geophysical quantity.
>>
>> This suggestion doesn't give enough information for the data to be
>> processed.
>> It's just a way of labelling raw data as such.
>>
>> If you want to identify the raw data as being a specific output
>> from a
>> particular kind of instrument, I think it's much better to give a
>> standard name
>> that indicates precisely what it is i.e. more specific than "rain
>> gauge
>> raw data". Then the user could work out how to process it.
>>
>> Cheers
>>
>> Jonathan
>>
>>
>>
>
>
> --
> *******************************************************
> * Nan Galbraith (508) 289-2444 *
> * Upper Ocean Processes Group Mail Stop 29 *
> * Woods Hole Oceanographic Institution *
> * Woods Hole, MA 02543 *
> *******************************************************
>
>
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


John

--------------
John Graybeal <mailto:graybeal at mbari.org> -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org
Received on Tue Apr 21 2009 - 18:02:01 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒