⇐ ⇒

[CF-metadata] standard names for variables in raw engineering units

From: Nan Galbraith <ngalbraith>
Date: Tue, 21 Apr 2009 14:40:17 -0400

Thanks, Jonathan and Roy. It seems we have several kinds of
'problem variables' to deal with:

- Components of geophysical variables (like voltage and temperatures
from a radiometer, or the sensor temperature from an oxygen probe;
these can be useful in troubleshooting or recalculating the geophysical
variables),

- QC kinds of parameters (like percent good, or error velocity, from an
ADCP),

- Raw instrument output that could be converted directly into
geophysical data. I'm not sure if something like rain level is an
example of this, or if the kinds of data that Roy mentioned are
different in some way.

> I think that if it's raw data which can be described more precisely
> as being the output of a particular kind of sensor, and it is in
> physical units, we should give it its own standard name; in such
> cases, the raw data would have more of a standard meaning, and
> standard algorithms could be applied to derive geophysical
> quantities from it, I imagine.

As it turns out, I think this is the case for my rain gauge example. A
more careful search of the standard names using "any of" precipitation,
rain, and evaporation turned up one that seems to work for the
instrument's raw output (the precipitation level in the gauge):

> lwe_thickness_of_precipitation_amount: The construction
> lwe_thickness_of_X_amount or _content means the vertical
> extent of a layer of liquid water having the same mass per unit area.

I suspect that this name was not intended for raw rain gauge data,
but that it would be alright to use it anyway.

> A possible way to deal with raw data would be to regard it as a kind
> of ancillary data and use a standard name modifier to indicate it (CF
> 3.3 and appendix C) e.g. raw_data. In your case the standard_name
> attribute would then contain "rainfall_rate raw_data". In Appendix C
> we could specify that the units are 1 i.e. dimensionless if there is
> a raw_data modifier.

I think this, or something like it, would have been a good way to
handle oxygen sensor temps, instead of assigning that the standard
name, temperature_of_sensor_for_oxygen_in_sea_water.

As Roy says, this would work only for variables that are in both raw
and processed form. And, for some instruments, there are multiple
components that should be carried along. We could use something
like this for longwave radiation sensor components, but would need
multiple modifiers, something like:

surface_downwelling_longwave_flux_in_air
... raw_data_thermopile_voltage
... raw_data_dome_temperature
... raw_data_case_temperature

Having sensor outputs from a radiometer "attached to" longwave
radiation would be useful, especially if it can be done in a way that
preserves the units of the temperatures and thermopile voltage.

Maybe all these variables will need standard names after all. I'd like
to know what anyone else thinks.

Cheers - Nan


> Thanks Jonathan,
>
> Another gap in my CF knowledge exposed. My reaction to your posting
> was based on the perspective of somebody who is going to have to
> semantically link a file of data in CF with data in another format.
> RDF is my weapon of choice, which requires the ability to reference
> concepts in both datasets as URLs. Whilst %20 is possible for a
> space, it's best avoided. So, at some stage I think we need to
> revisit the syntax of modified Standard Names (hyphen as a
> separator??).
>
> However, this side track isn't helping closure of Nan's issue, which
> I fully understand from data files that land on our doorstep. I
> think your view is based on the assumption that the raw data have
> corresponding processed data. If only! Frequently we get data from
> a complex package of sensors from scientists who are only interested
> in a subset. The rest are exactly as they came off the data logger.
> However, even in this state they have value for certain applications
> and so need labelling.
>
> As far as the raw_data qualifier goes, the $64,000 dollar question
> (to Nan I guess) is whether the labelling support required for raw
> data needs to support quantitative as well as qualitative use cases.
> If the answer is no, then the raw_data qualifier specifying values to
> be dimensionless is acceptable. Otherwise, we'll need to set up a
> distinct Standard Names for each raw channel variant. The more I
> think about, the more I see this as a safer option.
>
> Cheers, Roy.
>

>
>>>> Jonathan Gregory <j.m.gregory at reading.ac.uk> 04/13/09 8:28 AM >>>
>>>>
> Dear Roy
>
> The space is deliberate. The standard_name attribute consists of a standard
> name followed optionally by a modifier. We introduced this syntax to allow us
> to define ancillary data of various sorts e.g. a quality flag or a standard
> error, without requiring a new set of standard names. See CF standard section
> 3.3. Perhaps we should have put it in a different attribute; this decision was
> made years ago and I can't remember the discussion.
>
> I made the raw_data suggestion thinking that, in the case where you have a
> geophysical quantity, and you also want to save the raw data (perhaps that is
> the case Nan describes), the raw data could be regarded as ancillary
> information (a bit like a standard error). This mechanism, with a standard
> name modifier and perhaps using ancillary_variables to point to it (CF 3.4),
> might be suitable then. The modifier implies the units of the standard name
> have been transformed in a certain way. We could therefore specify them to be
> dimensionless for raw data, as that is a special case of transformation i.e.
> replace units u with 1. They could be in the same units as the geophysical
> quantity; that would need a different standard name modifier, which might be
> appropriate for uncalibrated data. However they could not be in different
> units, not related to those of the geophysical quantity.
>
> This suggestion doesn't give enough information for the data to be processed.
> It's just a way of labelling raw data as such.
>
> If you want to identify the raw data as being a specific output from a
> particular kind of instrument, I think it's much better to give a standard name
> that indicates precisely what it is i.e. more specific than "rain gauge
> raw data". Then the user could work out how to process it.
>
> Cheers
>
> Jonathan
>
>
>


-- 
*******************************************************
* Nan Galbraith                        (508) 289-2444 *
* Upper Ocean Processes Group            Mail Stop 29 *
* Woods Hole Oceanographic Institution                *
* Woods Hole, MA 02543                                *
*******************************************************
Received on Tue Apr 21 2009 - 12:40:17 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒