Dear all,
Jonathan suggested having a web-based tool which can be used to check 
possible standard names, prior to submitting them for human approval. 
This could use the grammar he developed for CF-metadata names, and which 
he has written up at 
http://www.met.reading.ac.uk/~jonathan/CF_metadata/14.1/ 
<
http://www.met.reading.ac.uk/%7Ejonathan/CF_metadata/14.1/>. (This 
grammar apparently handled all the Standard Names around when it was 
developed - a very impressive achievement.)
I thought it might help the discussion to implement this idea. This 
involved two steps:
1. Converting his grammar (as presented on Jonathan's web page) into 
Prolog's grammar notation.
2. Making a parser for this grammar available on the web.
The implementation of Jonathan's grammar in Prolog follows the approach 
which I have described previously on this mailing list, and which is 
written up at 
http://envarml.pbworks.com/w/page/8988921/Prototype+grammar+for+CF-metadata+%22standard+names%22+(Prolog+version) 
<
http://envarml.pbworks.com/w/page/8988921/Prototype+grammar+for+CF-metadata+%22standard+names%22+%28Prolog+version%29> 
- the only difference being that I have now used his grammar rules 
rather than ones based on the CF-metadata guidelines.
I have checked the list of standard names which Jonathan used in his 
work against the Prolog version of his grammar, and currently 1958 out 
of the total of 2072 names parse correctly. The remaining 100 or so are 
probably down to slight differences in the way that filler words, such 
as prepositions, are handled - this is just a question of checking all 
the rules for consistency in how the filler words are included.
My colleague Mark Muetzelfeldt has produced a web app which gives access 
to this Prolog grammar, providing browser-based access to a simple query 
system for checking that a proposed standard name conforms to the 
grammar. This is available at 
http://www.eco-epistemics.org/cf_metadata_grammar/. It includes the text 
for the complete Prolog version of Jonathan's grammar. It goes without 
saying that I would welcome any feedback (or, better, that we decide to 
set up some sort of working group to take this forward). Please note 
that this is a highly-experimental and early-stage exercise, designed 
primarily to explore what a grammar-based online CF-metadata checker 
might look like. It has been tested only in Firefox and Chrome.
A number of issues have arisen during this exercise:
1. I feel strongly that it is highly desirable to use a standard grammar 
notation (such as Prolog's) for representing the grammar. Apart from the 
benefits of using a standard approach, this makes it straightforward to 
handle arbitrary nesting of grammar rules (as in, say, a grammar for 
English), rather than Jonathan's flat set of rules.
2. In my opinion, it is far better for the name itself to contain all 
the information about a particular variable, rather than use a separate 
mechanism (modifiers). Consider a variable such as 
"monthly_mean_of_log_of ratio_of_leaf_carbon_to_root_nitrogen". This is 
straightforward to capture in a grammar (provided it can handle the 
recursive aspect of the nesting of mathematical functions, which most 
could), and almost impossible to capture by the use of modifiers on some 
base Standard Name.
3. Prolog is a particularly useful platform to use for this task. It has 
long had a specific notation for grammar rules which is very readable 
and supported natively by the Prolog interpreter. Using Prolog offers 
several substantial benefits over other approaches, including the 
ability to handle more advanced grammar requirements, the ability to 
query the grammar and/or a collection of Standard Names directly in the 
Prolog interpreter, and the ease with which it can made available as a 
web app.
4. One feature of Prolog which deserves special mention is that it can 
easily be used to automatically generate names which are valid according 
to the grammar - this can be done with a one-line query. This may seem 
useless, but in fact is a very effective way of picking up weaknesses in 
the grammar: if a generated name is (to the expert human) nonsense, then 
that can help us to refine the grammar.
5. Jonathan's grammar includes base (atomic) terms which could be 
further broken down, for example:
phenomenon -->
due_to_condensation_and_evaporation_from_boundary_layer_mixing
due_to_condensation_and_evaporation_from_convection
due_to_condensation_and_evaporation_from_longwave_heating
due_to_condensation_and_evaporation_from_pressure_change
due_to_condensation_and_evaporation_from_shortwave_heating
due_to_condensation_and_evaporation_from_turbulence
There is clearly scope here for some more rationalisation.
6. The current policy is (as I understand it) that each new Standard 
Name has to be approved individually, but that anyone can add whatever 
modifiers they like. If, as I suggest, the role of modifiers is 
incorporated into the grammar for Standard Names, then this raises 
issues about the approval process. I suspect it would be possible for 
the parser to detect which names require manual approval and which do 
not, according to which rules are fired, but this would need further 
research.
7. Ultimately I believe we need to move away from an approved list of 
names to an approved grammar for formulating names. However, as Jonathan 
has stressed, a grammar is useful - even when a manual approval process 
is used - for checking that names conform to agreed style rules.
Cheers,
Robert
On 03/03/11 18:02, Jonathan Gregory wrote:
> Dear Philip and John
>
> I agree with what Philip says here:
>
>> We could then tweak our current practice on this mailing list so that when a
>> person proposes a std_name they should state (or perhaps there is a little
>> bit of code to check) that the proposed std_name conforms to the existing
>> grammar and vocabulary rules.  I think most of us would then provide only
>> cursory scrutiny.  Perhaps there could even be an automatic timer so that if
>> nobody objects within some time period (perhaps 1 month) then the name is
>> automatically accepted.  Essentially the default decision for conforming
>> names would be 'acceptance'.  I think this would also make the generation of
>> the text descriptions either automatic, or perhaps obsolete, in many cases
>> because they could be inferred from the grammar and vocabulary tables.
> I could bring the grammar up to date as a starting point. I agree that it
> would be possible to work out text corresponding to each phrase and thus
> construct definitions, or at least a first draft of them. Units could also
> be deduced automatically. I don't myself have the expertise or the time to
> write scripts in support of this, to make it easy for proposers to use these
> procedures e.g. on the web.
>
>
> John writes:
>> But where we are talking about adding generic modifiers, it seems to me a more automated approach is possible.  If the meaning of the modifier is clear, then no matter what name it is applied to, the meaning of the resulting compound should be clear.  If that is the case, then adding that modifier to an existing name should be verifiable mechanically.
> If this refers to the standard_name modifiers, which are separate words
> appended to standard names, then in fact no approval is needed. It is fine
> to add these to the standard_name attribute. That is not regarded as creating
> a new standard_name. In fact the modifiers were introduced to avoid having to
> add such names to the table.
>
> Best wishes
>
> Jonathan
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Received on Tue Mar 08 2011 - 06:14:09 GMT