[CF-metadata] CF grammar and online tool from Robert Muetzelfeldt on 2011-03-09 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Robert Muetzelfeldt <r.muetzelfeldt>
Date: Wed, 09 Mar 2011 13:32:28 +0000

Dear Martin,

Thanks for the positive and insightful feedback.

On 09/03/11 12:09, Schultz, Martin wrote:
> Dear Robert,
>
> this is great! I would definitively support any proposal to try and follow this route in the future.
Thanks!
> However, it will require some further discussion how to handle semantically incorrect names. As I understand it, the grammar can ensure that we arrive at syntactically correct names (which then have a fair chance of being physically meaningful), but all in all the matrix will remain sparse and we would need to find a way how to exclude useless combinations of grammar terms (to come back to the example from your Prolog grammar description: "bone eats dog" doesn't make sense).
This type of grammar is called a "semantic grammar", because the basic
terms are meaningful in a particular subject area. This contrasts with
more familar grammars for, say, English, which consist of purely
syntactic terms, such as 'noun_phrase'. So there is a much better
chance that arbitrary combinations of words will be semantically valid,
even though we are processing phrases in a purely syntactic way.
Careful choice of base terms can help.

One of the advantages of Prolog over against other grammar systems is
that Prolog's grammar notation is a cosmetic layer on top of raw
Prolog. And Prolog is a language which is widely used for knowledge
processing etc. It is therefore straightforward to add (additional)
semantic constraints into the grammar rules.

The "bone eats dog" example could be handled using either of the above
two approaches. Either have narrower categories (e.g. 'animal' rather
than 'object'). Or have an "eats" grammar rule with additional
constraints that the first term must be of type animal.

But yes, I agree: this requires further discussion.
> Related to the approval process two points: 1) I still think a standard_name list will be useful to maintain (at least for a while to come), simply because it can be relatively easily integrated in any kind of data analysis or checking tool. If you would have to interact with a web application each time before you want to make a plot of your data, you might be getting a lot of frustration over time. This doesn't mean that the list could not eventually be generated automatically, but there should still be some "approved list" which doesn't change too frequently so that people can keep track with downloading it.
I sort-of agree with you in the short term, though qualifying this with
the special case of 'modifiers'. If we accept that these can be
incorporated into names, and that currently they do not need to go
through the approval process, then that should continue.

For brevity, I did not mention in my original posting that the
implementation approach which Mark adopted gives you a direct API for
interrogating the web service. Therefore a human does not need to go to
the web site - a program can do this automatically. Alternatively, one
can install a free Prolog (I use SWI-Prolog), download the grammar
rules, and run a trivial (10-line) Prolog program locally.
> 2) Perhaps one could redirect attention of the approval process to grammar elements rather than complete standard names? As the "sedimentation" discussion shows very nicely, adding a new term often merits a good discussion. On the other hand, if I copy a concept (i.e. use an existing standard name/grammar as template), such discussion may not be needed.
Yes, I think the logical development of a grammar-based approach is that
the approval discussion will indeed relate to basic terms (and grammar
rules), rather than complete Standard Names.
> Here, I would indeed welcome the "timer" idea, so that new standard names would be accepted automatically if no one objects within a period of 1 month or so.
>
> If there was a web-based tool for testing new standard_names and perhaps even automatically "registering" them for approval, the email discussions on this list could be cut down to the more fundamental discussions and the discussion about those names that are not universally accepted.
It should be easy to add a "Submit for approval" button to the web page.
> Next: modifiers or not? Indeed, this question hinges on the approval process. If there is no need to approve the exact standard name, but only its elements, then the modifier could indeed become part of the standard_name (again: the individual modifiers should be agreed upon, but their (recursive) combination would be flexible).
Yes.
> Finally: concerning the provisional web tool. I tried to enter "mass_flux_of_nitrogen_oxide_in_air_due_to_emission_from_boreal_forest_fires" as a test case and received the answer "IS NOT" a valid standard name. OK: that's good to know, but is there a chance that the tool could also tell me which rule(s) are violated? That would be extremly helpful and probably key to success or failure of such a tool in the long run.
I agree that this is the area that needs a lot of work: e.g. displaying
the parse tree for names that are valid (Prolog supports this); and
providing some sort of guidance on names that fail. It's in general
hard to say why something failed (or, rather, there may be many ways in
which it failed), but we can still try. One can also imagine a
variety of tools which help you to formulate valid names: e.g. a tool
which tells you what are words (or grammatical categories) can legally
follow what you have entered so far.

Thanks again,
Robert

> Best regards,
>
> Martin
>
> = Dr. Martin G. Schultz, IEK-8, Forschungszentrum J?lich =
> = D-52425 J?lich, Germany =
> = ph: +49 (0)2461 61 2831, fax: +49 (0)2461 61 8131 =
> = email: m.schultz at fz-juelich.de =
> = web: http://www.fz-juelich.de/icg/icg-2/m_schultz =
>
>
> -- referes to:
>> Date: Tue, 08 Mar 2011 13:14:09 +0000
>> From: Robert Muetzelfeldt<r.muetzelfeldt at ed.ac.uk>
>> Subject: Re: [CF-metadata] standard_name modifiers
>> To: cf-metadata at cgd.ucar.edu
>> Message-ID:<4D762BA1.4040609 at ed.ac.uk>
>> Content-Type: text/plain; charset=us-ascii; format=flowed
>>
>> Dear all,
>>
>> Jonathan suggested having a web-based tool which can be used to check possible standard names, prior to
>> submitting them for human approval.
>> This could use the grammar he developed for CF-metadata names, and which he has written up at
>> http://www.met.reading.ac.uk/~jonathan/CF_metadata/14.1/ [...]
>>
>> I thought it might help the discussion to implement this idea. This involved two steps:
>> 1. Converting his grammar (as presented on Jonathan's web page) into Prolog's grammar notation.
>> 2. Making a parser for this grammar available on the web.
>>
>> The implementation of Jonathan's grammar in Prolog follows the approach which I have described previously
>> on this mailing list, and which is written up at
>> http://envarml.pbworks.com/w/page/8988921/Prototype+grammar+for+CF-metadata+%22standard+names%22+(Prolog+version)
>> - the only difference being that I have now used his grammar rules rather than ones based on the
>> CF-metadata guidelines.
>> [...]
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Received on Wed Mar 09 2011 - 06:32:28 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST