⇐ ⇒

[CF-metadata] a different (but perhaps unoriginal) approach to standard name construction

From: Heinke Hoeck <heinke.hoeck>
Date: Tue, 04 Nov 2008 11:26:34 +0100

Dear Karl et al.
>
> 1) Currently it is impossible to identify with a single standard name,
> closely related variables that one might want to store in a single
> array). For example, such quantities as:
Could this be an overload of the standard names? To give roles for
creating experiment and dataset names (entities) for our data base CERA
results in the discussion ?What part of the whole set of metadata should
put into the experiment and dataset names ? And what part should be
stored in the table ??
For the standard names I would say we should think about what
information should be part of the standard name and what should be
stored in the variables to avoid an overload of the standard names. The
standard name description is a help but for searching the right standard
name this should be only a definition or description of the keywords.
For the use of common terms in special domains (for example ocean
Sciences) we should use the common concept.
Your structuring is very welcome.
Identification of closely related variables looks for me to be managed
with an ontology.
> The idea, which I'm sure must have been discussed at length already
> (but I've forgotten by now or I've missed it entirely), is to parse
> the quantity identification information into separate elements (or
> "categories" or "components").
Should this components be the elements of the standard names or
separated into the variables?
> We already do this to a certain extent by providing some information
> in the cell_methods attribute.
This examples confused me a little bit. The cell methods should not be a
part of the standard name. Is this correct?
> These independent bits of information could be automatically assembled
> together to create the "standard name". The current standard names
> would in some cases be identical to the names created from the
> elements, and in other cases we could establish aliases. This would
> make it obvious in many cases how to construct new standard names, and
> in any case would impose a structure on the standard names.
I like this idea very much because the standard name attributes should
be sorted with name lists and this gives us the opportunity to put the
most important attribute at the first place and so on. To search for a
name and to create a new will be much easier.
>
> As a first step, it might be useful to consider the following
> components (many of which appear in the "guidelines" document referred
> to above):
I think that we should discuss every component and make the decision
whether it should be part of the standard name or stored as a variable
attribute.
Is it right that you want the quantity to rank first in the standard name?
>
> 1. quantity: the fundamental quantity (e.g., temperature, pressure,
> geopotential_height, precipitation_rate, concentration)
I agree with this component and your examples.
But I think that precipitation_rate and precipitation_flux is the same.
The information of the difference is stored in the unit. The difference
between the measuring volume and the weight per unit area should be
expressed in the unit or measurement_method.
An other example is mole_fraction and mass_fraction. This should be
fraction alone.
And the units should be mol mol-1 and kg kg-1. The unit information
should not be part of quantity. For this the dimensionless unit value
has to be reconsidered.
>
> 2. medium: where the quantity is "measured" (e.g., sea, atmosphere (or
> air?), sea_ice, troposphere, lake, stream land_ice, cloud,
> ocean_surface_mixed_layer)
>
This should not be a part of the standard name. I would say we could
name it medium_type. And we should deal with it like the area_type.
> 3. constituent: e.g., hydrometeor, ice, snow, rain, CO2, SO4, ozone,
> aerosol, sulfate_aerosol, soot.
The values of the constituent should be expressed in a variable
attribute. And in
the standard name we should have the placeholder X or constituent.
I know that not everyone agreed. But I think the number of chemical
components are only the start. The biology components are knocking at
the door.
>
> 4. specie_color: for when we want to distinguish constituents by what
> produced them (e.g. the sulfate aerosol in the atmosphere that comes
> from different sources: anthropogenic, natural, fossil_fuel, etc.)
I would call it species_source. This looks like a optional
supererogation and could be stored in a variable attribute.
>
> 5. surface: a quasi-horizontal surface that cannot easily be described
> by a vertical coordinate (e.g., sea_floor, top_of_atmosphere,
> tropopause, adiabatic_condensation_level, surface)
I agree.
What is about surface area of aerosol for example for
surface_area_density;m-1
We should be careful with the term ?surface?. And name the example
particle_surface.
But I would like to add 2m_elevation and 10km_geoid.
>
> 6. process: identifying what process is responsible for the quantity
> (e.g., for temperature tendencies: radiation, convection,
> latent_heating, etc.) [I wonder if specie_color might be combined with
> "process" into a single category?]
I agree. This should be part of 4. species_source.

>
> 7. vector_component: indicating the component of a vector and its
> positive direction (e.g., eastward, northward, upward)
This should not be part of the standard name like the unit.
>
> 8. radiative_flux_component: indicating whether only the downwelling
> (incoming) or upwelling (outgoing) or net radiative flux is stored
I would like to name it unit_area_orientation and I would eliminate the
incoming and outgoing term.
>
> 9. tensor_component: ????
Yes, I agree. ?tensor_of?. In the future we have to deal with this.
>
> 10. assumption: indicating that the quantity has been calculated under
> some assumption (e.g., assuming_clear_sky, assuming_no_snow)
This looks very similar to 14. cell_method area_type with an assumption
term.
>
> 11. threshold: indicating that the quantity has been calculated only
> when certain conditions are satisfied. The form of this attribute
> would have to be worked out, but presumably would identify both the
> condition(s) and the values (or variables containing the values) of
> the thresholds could be specified.
>
I agree.

If we describe the 11 categories above the standard name could be very
large. This is only a warning. Does a limitation exist ?
> The remaining 6 categories might not be considered part of the
> "standard_name" information, but might better be defined as new
> variable attributes:
>
> 12. formula (or transformation?): indicating that in some sense the
> quantity is a "compound" quantity derivable from more fundamental
> quantities. surface_net_downward_radiative_flux would have a
> formula="sw + lw", and the data writer would also store in the file a
> dummy variable (i.e., it would be either a scalar or array with
> possibly only one element, which would be set to missing_value), and
> the attributes associated with these two variables would define the
> quantity stored (e.g., in this example, "sw" would have a standard
> name of surface_net_downward_shortwave_radiative_flux, and similarly
> for "lw") As another example, a temporal correlation of quantity "a"
> and quantity "b" could be indicated by formula="correlation(a,b)". As
> a third example, an "anomaly" could be represented as the difference
> between two variables, and the attributes associated with the variable
> representing the "base" state could explicitly indicate how it was
> calculated (e.g., for a climatology, the climatological period). For
> the formula attribute, we might consider adopting the syntax for the
> formula from something like matlab, I guess. Note that the formula
> attribute makes it possible to express many different quantities
> without agreeing explicitly on their standard names (just the
> standard_names of their formula terms). Note also, that It is possible
> that the threshold information (#11 above) might be represented
> instead by an appropriate formula.
Does this cover the transformation of the guidelines for construction ?
I think deep thought is needed about this.
>
> 13. measurement_method: indicating what type of sensor was used to
> measure the quantity (e.g., for sea surface temperature observations,
> bucket or ship_intake_temperature, and for models where there are
> multiple methods of defining cloud radiative forcing, specifying which
> of two well-know procedures known as "method 1" or "method 2" is used.
I agree.
>
> 14. area_type: indicating that instead of applying to the whole grid
> cell (which would be the default)
This is not the default see Section 7. Do you think we should change this?
> , the quantity applies only to a certain portion, as in the current
> "where_type" construction (e.g., where_land would be indicated by
> "land", and where_sea_ice would be indicated by "sea_ice")
*. medium_type: cell_method
>
> 15. region: specifying the geographic region from which the quantity
> is extracted (e.g., asia, africa, australia)
No, I don?t agree. We don?t need this. This should not be part of the
standard.
>
> 16. experiment: containing the name of the experiment that produced
> the output.
No, I don?t agree. This should not be part of the standard.
>
> 17. source: containing some indication of the source of the data,
> whether it be from observations (e.g., ERBE) or from a model (e.g.,
> CCSM3). A variable containing output from a multi-model ensemble
> (regridded to a common grid) could be stored with "source" as a
> dimension and the names of the models recorded as coordinate labels.
I would name it data_source to distinguish between species_source and
data_source.
>
>
18. unit:

Best wishes
Heinke
Received on Tue Nov 04 2008 - 03:26:34 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒