⇐ ⇒

[CF-metadata] Constituents as array dimension

From: Schultz, Martin <m.schultz>
Date: Wed, 22 Oct 2008 08:58:18 +0200

 

-----Original Message-----
From: cf-metadata-bounces at cgd.ucar.edu
[mailto:cf-metadata-bounces at cgd.ucar.edu] On Behalf Of
cf-metadata-request at cgd.ucar.edu
Sent: Wednesday, October 22, 2008 1:10 AM
To: cf-metadata at cgd.ucar.edu
Subject: CF-metadata Digest, Vol 67, Issue 9

Send CF-metadata mailing list submissions to
        cf-metadata at cgd.ucar.edu

To subscribe or unsubscribe via the World Wide Web, visit
        http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
or, via email, send a message with subject or body 'help' to
        cf-metadata-request at cgd.ucar.edu

You can reach the person managing the list at
        cf-metadata-owner at cgd.ucar.edu

When replying, please edit your Subject line so it is more specific than
"Re: Contents of CF-metadata digest..."


Today's Topics:

   1. Re: centralized vs. community-owned name spaces (Schultz, Martin)
   2. CF standard names for chemical constituents and aerosols
      (Jonathan Gregory)
   3. Re: CF standard names for chemical constituents and aerosols
      (resulting from a GRIB2 p (Benno Blumenthal)
   4. Re: CF standard names for chemical constituents and aerosols
      (Philip J. Cameronsmith1)


----------------------------------------------------------------------

Message: 1
Date: Tue, 21 Oct 2008 22:01:19 +0200
From: "Schultz, Martin" <m.schultz at fz-juelich.de>
Subject: Re: [CF-metadata] centralized vs. community-owned name spaces
To: <cf-metadata at cgd.ucar.edu>
Message-ID:
        
<2E9C45494F78A3498A77616E7C2DBB4C019F3652 at icg217.icg-ii-w2k.kfa-juelich.de>
        
Content-Type: text/plain; charset=iso-8859-1

Seth et al.,

    indeed, this sounds like the right concept to follow: let CF define the
principle and let the communities organize themselves to fill in one or
several tables. Of course, it also raises many questions, but all-in-all I
would advocate that this is the way forward.

    Having said this, I would nevertheless propose that the CF core group
(and this discussion list) plays an active role in guiding the principles
and reviewing proposed name spaces. Specifically for the atmospheric
chemistry discussion my suggestion would be to agree (as soon as possible)
on the "syntax definition" of the standard names (I hope that Martina
Stockhause has an up-to-date version of her proposal including your helpful
comments), i.e. basically the
"<physical_quantity>_of_X_in_<medium>_as_<identity>" level of Martina's
proposal. The chemistry community could then among themselves agree on the
species names (or if not then we will have two or three species lists) and
construct (and maintain) the "full-blown tables" according to the general CF
rules.

    What this would require though is some form of registration process for
the community which includes a link to a standard repository of names
defined and maintained by this community. And of course, the CF site should
then have links (and occasional communication) with the registered
communities.

    I don't think this is exactly the same as the "common concept" proposal,
because the one identifying attribute would still simply be called
"standard_name". However, somewhere in the global attribute section the file
should contain an identifier of the communities whose name definitions were
used. Should this be in the Convention attribute? (example: "CF1.2, ACC1.0")
I don't know....

Best regards,

Martin

PS: another question is where the community name discussions would take
place. In order to keep things together at least a bit, I would propose to
add mailing lists for the registered communities at cgd.ucar.edu (example
cf-acc at cgd.ucar.edu) if this is possible.

< Dr. Martin G. Schultz, ICG-2, Forschungszentrum J?lich >
< D-52425 J?lich, Germany >
< ph: +49 (0)2461 61 2831, fax: +49 (0)2461 61 8131 >
< email: m.schultz at fz-juelich.de >
< web: http://www.fz-juelich.de/icg/icg-2/m_schultz >



-------------------------------------------------------------------
-------------------------------------------------------------------
Forschungszentrum J?lich GmbH
52425 J?lich

Sitz der Gesellschaft: J?lich
Eingetragen im Handelsregister des Amtsgerichts D?ren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDir'in B?rbel Brumme-Bothe
Gesch?ftsf?hrung: Prof. Dr. Achim Bachem (Vorsitzender), Dr. Ulrich Krafft
(stellv. Vorsitzender), Prof. Dr. Harald Bolt, Dr. Sebastian M. Schmidt
-------------------------------------------------------------------
-------------------------------------------------------------------




------------------------------

Message: 2
Date: Tue, 21 Oct 2008 22:58:20 +0100
From: Jonathan Gregory <j.m.gregory at reading.ac.uk>
Subject: [CF-metadata] CF standard names for chemical constituents and
        aerosols
To: cf-metadata at cgd.ucar.edu
Message-ID: <20081021215820.GA384 at met.reading.ac.uk>
Content-Type: text/plain; charset=us-ascii

Dear all

I think there are two kinds of difficulty with additions to the standard
name table which need to be distinguished. First, there is the problem of
the standard name table getting very large because of the possibly large
number of chemical species. That presents a problem of organisation of
metadata, but it does not cause delays in assigning standard names. It is
easy to add lots more standard names which follow the same patterns as
existing ones. Second, there is the problem of delays when requests are made
for new standard names. This problem is caused by the intellectual
difficulties of working out what the concepts are and what the names for
them should be.

As discussed e.g. by Heinke, Martin and Philip, we could avoid the first
problem by adopting species-independent standard names. I would favour a
syntax which identifies where to look for the species name, e.g.
mass_concentration_of_[VAR]_in_air, where VAR is the name of the
string-valued coordinate variable or scalar coordinate variable that names
the species, and [] is a special syntax. That kind of syntax would allow
more than one place-holder, which may be necessary because some quantities
might identify more than one species e.g. reaction rates or the existing
ones of the kind
mole_concentration_of_[VAR1]_in_sea_water_expressed_as_[VAR2], where VAR1
could be have the value "mesozooplankton" and VAR2 the value "nitrogen". The
lists of what can fill the gaps could, as has been suggested, be maintained
by groups with the relevant expertise.

I think that a stronger argument for this than the size of the standard name
table, which should be no problem for software, is that chemical models may
internally have array dimensions for species, in which case it would be
natural to write out arrays of results. Are the models indeed like that?

If we take this kind of approach, any combination of standard name and
species would be possible, as only the contents of the lists would be
regulated, not the combinations. There would not be a way to prevent
nonsense such as mass concentration of mesozooplankton in air. However, I
don't think that's a problem really. We currently have no way to prevent
sea_water_temperature with a height coordinate of 10 km above the ground,
and that's not a problem.

The second issue is more difficult. As I have argued before, I do not think
it can be helped by allowing projects to develop independent tables if we
want to use standard names to compare data from different sources. That is
one of the main reasons they are useful, I think, as Seth says too, and it's
why they are called "standard". If there were many tables, of course it
would become easier to add new names within projects, but interoperability
would be lost among projects. Interoperability can be maintained by across
tables by mappings (ontologies) but that is hard work. With more tables it
would be harder work. Who would do it? Dividing up the standard name table
would compound the intellectual difficulty, rather than easing the problem.

So why is agreement of new standard names slow? I think it is because it is
difficult. It is not principally because we are arguing about syntax (though
it is partly), but because we are working out what we actually mean, and how
to describe it in ways consistent with other quantities we have defined.
That is, I think the slowness is mostly about the the definition, not about
the meaningful identifier, in Bryan's terms. It is largely a scientific and
communication problem, not a technological one. (As an example, at the end
of this email I have listed some of the issues that Stephen Griffies and I
have just been discussing in order to make proposals for standard names for
ocean quantities to be requested by CMIP5.) I do not see any easy way to
dissolve this difficulty. We can move it around or conceal it, but not
easily get rid of it. If we want to reduce the difficulty of the problem,
we could choose to lower the standards currently applied to the clarity of
concepts. That would mean that projects using standard names would have to
decide for themselves more about what they meant before using them, or
suffer more confusion through not doing so, and interoperability would be
reduced too. CF would do less work, and would be less useful as a result.
But if we decide to go that way, I for one won't complain about doing less
work. I don't do it for fun!

I agree with Steve H that technology could help to ease the problem, though,
by providing more tools. Could we provide tools to allow it to be easier to
search standard names in cleverer ways? It might be that the ocean names
I've been discussing with Stephen G could have been chosen more quickly if
it had been easier to search the existing names, as many of the quantities
that appeared to be new did actually have existing names. Could tools be
written to digest the table into those phrases and words from which the
existing names are constructed, and to present menus which allow
construction of names from the existing elements, with the possibility of
proposing new elements to be inserted in existing patterns? That would be a
big help.

Best wishes

Jonathan


Some issues in defining ocean standard names:
- Basin masks for tracer and velocity are the same geophysical quantity, but
distinguished by coordinates. Grids are not identified by standard names;
that is an issue of how CF organises metadata.
- We say "sea floor", not "ocean bottom". You could say either, but in a set
of definitions it is important to be consistent about terminology, or the
reader will wonder if a distinction is being drawn.
- If you speak of the mass of the ocean, does it include sea-ice?
- We do not need a separate name for global-mean sea water temperature; we
can use sea_water_temperature and indicate the mean in cell_methods. That's
another issue of organisation of metadata.
- What does "ideal age" of sea water mean?
- Is the vertical integral of mass transport in an ocean model with a free
surface to be regarded as the same geophysical quantity as the vertical
integral of volume transport in a rigid-lid model multiplied by density?
- Is the mixed-layer depth determined by a buoyancy criterion the same
concept as mixed-layer depth determined by sigma-theta?
- Is the sea water "mixing depth" the same concept as mixed-layer depth
defined by the mixing scheme?
- Transports across various straits are all the same geophysical quantity,
and the strait should be identified by some string-valued coordinate.
- Do we want to know the rainfall flux over the whole grid box, or just the
part that falls into the liquid water (and not on the sea-ice)? These can be
distinguished by cell_methods.
- What's a clear way to describe the heat flux associated with the
temperature of rainfall not being the same as the temperature of the ocean
it falls into?


------------------------------

Message: 3
Date: Tue, 21 Oct 2008 18:51:07 -0400
From: "Benno Blumenthal" <benno at iri.columbia.edu>
Subject: Re: [CF-metadata] CF standard names for chemical constituents
        and aerosols (resulting from a GRIB2 p
To: "John Graybeal" <graybeal at mbari.org>
Cc: cf-metadata at cgd.ucar.edu
Message-ID:
        <179873a70810211551r118128a4jf100d0d7bf717d61 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Let me just add that a URI datatype is useful not just for external
references, but it is also useful for local references as well, e.g.
instead of cf:grid_mapping being a string that is the name of a local
variable, it would be a URI that points to a local variable (i.e.
probably still that same variable name). Seemingly a minor distinction, but
it allows a clean mapping into an object-oriented framework for information
in netcdf files. LIke RDF, or Java Beans, or ...

Benno

On Tue, Oct 21, 2008 at 11:24 AM, Benno Blumenthal <benno at iri.columbia.edu>
wrote:
> Hello All,
>
> This is heading directly towards where I was trying to steer
> common_concept in particular (Trac Ticket #24), and where the
> namespace discussion went more generally (Trac Ticket #27). There is
> a need for a framework for using namespaces in netcdf, both with the
> attributes (so attributes can be explicitly placed in different
> conventions, not the issue here per se), and attribute values (so that
> attributes can point to URI's identifying concepts). My short
> version of this is that cf:standard_name is a property that points to
> a single controlled vocabulary (i.e. a string chosen from a list),
> while the proposed cf:common_concept is a property that points to
> concepts identified with URI's (e.g. namespace plus value).
>
> standard_name's could have URI's as well (Roy Lowry has assigned
> opaque URI's to all the standard names, MMI has assigned more readable
> URIs to many of the standard_names (e.g. cf:air_temperature), CF could
> very well get a URN space of its own which will allow it to create
> some definitive URIs, which lets us relate cf:standard_name's to other
> vocabularies (Roy has done this as well), and would let us write down
> relationships between standard_names (Roy is working on this).
>
> The attribute namespace problem is easier, and we discussed a couple
> of possible solutions which we hope to sort out by actually
> implementing them. namespaces for values seems trickier at the
> moment, because netcdf (unlike DAP) does not have a URI datatype, and
> so even if one had a list of namespaces specified as id #24, one would
> not be sure which string-valued properties have been
> namespace-abbreviated without reading through some document that
> describes each property in the convention, a level of sophistication
> in the reading software we would like to avoid. Of course, one does
> not need to abbreviate -- a URN in particular should be quite readable
> in its entirety.
>
> Benno
>
> On Mon, Oct 20, 2008 at 3:22 PM, John Graybeal <graybeal at mbari.org> wrote:
>> Seth,
>>
>> That's an entertainingly close description of the semantic
>> interoperability framework [1] and workshop [2] that MMI is proposing
>> and offering for marine and related environmental sciences.
>> (apologies for the plug, but it sure seemed relevant) So you'll have my
full support for that concept.
>>
>> The interesting and subtle distinction between MMI's approach, and
>> what I think you're implying is that there would be a single
>> authority (CF) tying together all those names (e.g., using the CF
>> aliases to bring the names from different namespaces together).
>> There are two ways to do that, but I don't think aliases or a 'single
>> authority' (in the way you imply it) is the right model.
>>
>> I think a better model is to consider the CF standard names one of
>> the major namespaces, to which many other vocabularies can be mapped.
>> Then, modify the CF standard to allow the specification of a term
>> from any namespace (since many terms may not be appropriate to the CF
>> vocabulary; see my comment on CF scope in the last post); this is a
>> proposal I've been back-burnering for about 9 months now. Finally,
>> CF can accept any or all well-framed names that are in any of these
>> namespaces, or can even alias to them, through manual or semi-automated
processes.
>>
>> Under this model, CF has a mind share on solving this problem with a
>> particular approach, but CF users can take advantage of other
>> approaches. CF doesn't have to take on the 'vocabulary publication
>> and mediation' task for the entire community, but can pick and choose
>> its own targets for inclusion in the CF vocabulary (and the targets
>> will be much more conspicuous and contextually defined).
>>
>> There are multiple complexities that arise as you start trying to
>> make this concept (mine, yours, or CF's) work -- I like to think our
>> MMI plans know about 85% of the issues in a fully operational but
>> basic system -- and it isn't clear whether you'll want many
>> organizations serving the results, or relatively few. But the model
>> can put a lot more precise naming and mapping capability into action
>> in a very short period of time (we hope to go live in a matter of
>> weeks), and should make clear the difference in scope between what CF is
currently addressing, and what needs to be accommodated.
>>
>> Having said all that, I would like CF's model of "meaningful,
>> coherent, well-structured names" to be a paradigm that is maintained.
>> If one group uses opaque codes, I would rather not see CF start
>> including those just because they are acceptable to some. One of CF's
>> big strengths is the "instant interoperability" you get from
>> recognizing and understanding the name.
>>
>> John
>>
>>
>> [1] http://marinemetadata.org/semanticframeworkconcept (many of the
>> planned concepts are not fully documented yet) [2]
>> http://marinemetadata.org/events/oossi (sign up soon if you want
>> good hotel rates!)
>>
>> On Oct 20, 2008, at 11:40 AM, Seth McGinnis wrote:
>>
>>>> I don't know if this is supportive or not, but the "sheer number of
>>>> groups bringing new requirements" is an issue that I don't think CF
>>>> has realistically addressed.
>>>
>>>
>>> With regard to the standard names problem, here's a perhaps-mad idea
>>> from someone standing on the sidelines. I don't think this idea has
>>> been floated before, but I may have missed it.
>>>
>>> Where I'm coming from: I'm the data manager for a modeling project
>>> (NARCCAP, http://narccap.ucar.edu) that's using the CF standard. I
>>> don't generate or use any of the project's data, I'm just in charge
>>> of checking it to make sure it's properly formatted and trying to
>>> make things usable for the community that will use the data. From
>>> my perspective, the really important thing about standard names is
>>> that if two different files have a variable with the same standard
>>> name, it means (1) they're talking about the same thing, and (2) a
>>> user can go somewhere and look up the definition and the canonical
>>> units. The fact that the name has passed through an approval process is
entirely incidental.
>>>
>>> So... why not just let any group that wants to define a set of
>>> standard names for their community do so, as long as they're willing
>>> to also take charge of publishing a Standard Name Table for it?
>>>
>>> Essentially, it would be creating namespaces. The only mechanism
>>> that would need to be added would be some way of noting which
>>> namespace the name comes from and the URL of the corresponding
>>> Standard Name Table. So, for example, maybe for chemical names, all
>>> the standard names would all start with "iupac:" and somewhere in
>>> the global attributes for the file you'd have "standard_namespaces =
(default:
>>> cf-pcmdi.llnl.gov/cf-standard-name-table.xml, chem:
>>> www.iupac.org/cf/standard-names-3.0.xml)". Or something like that.
>>>
>>> Each community can manage its own namespace. There's already a
>>> mechanism in place (aliases) that could be extended to handle cases
>>> where there are two different names for one thing. And if you end
>>> up with two names for the same thing with different dimensions or
>>> slightly different definitions, that's fine, because that situation
already exists in the single table.
>>> (Rain, for example, can be defined as a mass flux per unit area
>>> (precipitation_flux) or as an accumulated depth
>>> (lwe_thickness_of_precipitation_amount). They're different ways of
>>> talking about the same physical thing. So this is not a new problem
>>> for the user.)
>>>
>>> Put some bounds on the representational domain that the default
>>> standard name table will handle, delegate everything else to whoever
>>> in that community wants to deal with it, and no matter how many
>>> groups come to CF with new standard names, the system can handle it.
Right?
>>>
>>> Cheers,
>>>
>>> --Seth
>>>
>>> ----
>>> Seth McGinnis
>>> mcginnis at ucar.edu
>>> NARCCAP Data & User Community Manager Associate Scientist ISSE /
>>> NCAR
>>> ----
>>> _______________________________________________
>>> CF-metadata mailing list
>>> CF-metadata at cgd.ucar.edu
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>>
>> John
>>
>> --------------
>> John Graybeal <mailto:graybeal at mbari.org> -- 831-775-1956
>> Monterey Bay Aquarium Research Institute
>> Marine Metadata Interoperability Project: http://marinemetadata.org
>>
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>
>
>
> --
> Dr. M. Benno Blumenthal benno at iri.columbia.edu
> International Research Institute for climate and society
> The Earth Institute at Columbia University
> Lamont Campus, Palisades NY 10964-8000 (845) 680-4450
>

... I agree with Philip and would like to re-inforce that I think species as
individual variables are much easier to handle. Usually, the model outputs
only a subset of the species it calculates (who cares about the ethylperoxy
radical concentrations, for example?), and array indexing on the results
means that one always has to manage an extra dimension and keep track of
what index 1 stands for. This is a bit like the early days of the GRIB
concept when every group had their own table and ozone, for example, was
"output variable 243" in one model and "output variable 128" in the other.

    True, however, that there is a fuzzy divide with respect to aerosol size
classes or the "new" dimension of "mesozooplankton" etc. (Gas-phase is EASY
in spite of the 4500 compounds ;-)

Best regards,

Martin

PS: re Jonathan - very valid points! In the end it will probably be the tool
question that determines the extent to which CF will be used. I know that
this is extra work (and I don't want to do it!), but in some of the previous
discussions I had the feeling that the semantics or syntax discussion of new
standard names focused entirely on the human brain, which had proven
notorioulsy difficult in the past to be implemented in software rules.
Perhaps it it time to check the existing standard name definitions if or how
they could be "parsed" by such tools. My guess is that, overal it should
work reasonably well (if there are certain key words like "_in_", "_of_",
"_as_"), but there might be a couple of terms which make a programmer's life
difficult. Then the question is: should these be revised even if they are so
standard (like "air_temperature", which perhaps should be
"temperature_of_air" ?).

    Concerning the "two table" approach: I believe we pretty much agree
here. What I meant to say is that the difficult problem should still be
sorted out in the CF main list, but the species table (or tables combining
marine life forms with species, etc.) could be maintained by a
sub-committee, i.e. community.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5634 bytes
Desc: not available
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20081022/952f4190/attachment-0002.bin>
Received on Wed Oct 22 2008 - 00:58:18 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒