Dear all,
Jonathan suggested having a web-based tool which can be used to check
possible standard names, prior to submitting them for human approval.
This could use the grammar he developed for CF-metadata names, and which
he has written up at
http://www.met.reading.ac.uk/~jonathan/CF_metadata/14.1/
<
http://www.met.reading.ac.uk/%7Ejonathan/CF_metadata/14.1/>. (This
grammar apparently handled all the Standard Names around when it was
developed - a very impressive achievement.)
I thought it might help the discussion to implement this idea. This
involved two steps:
1. Converting his grammar (as presented on Jonathan's web page) into
Prolog's grammar notation.
2. Making a parser for this grammar available on the web.
The implementation of Jonathan's grammar in Prolog follows the approach
which I have described previously on this mailing list, and which is
written up at
http://envarml.pbworks.com/w/page/8988921/Prototype+grammar+for+CF-metadata+%22standard+names%22+(Prolog+version)
<
http://envarml.pbworks.com/w/page/8988921/Prototype+grammar+for+CF-metadata+%22standard+names%22+%28Prolog+version%29>
- the only difference being that I have now used his grammar rules
rather than ones based on the CF-metadata guidelines.
I have checked the list of standard names which Jonathan used in his
work against the Prolog version of his grammar, and currently 1958 out
of the total of 2072 names parse correctly. The remaining 100 or so are
probably down to slight differences in the way that filler words, such
as prepositions, are handled - this is just a question of checking all
the rules for consistency in how the filler words are included.
My colleague Mark Muetzelfeldt has produced a web app which gives access
to this Prolog grammar, providing browser-based access to a simple query
system for checking that a proposed standard name conforms to the
grammar. This is available at
http://www.eco-epistemics.org/cf_metadata_grammar/. It includes the text
for the complete Prolog version of Jonathan's grammar. It goes without
saying that I would welcome any feedback (or, better, that we decide to
set up some sort of working group to take this forward). Please note
that this is a highly-experimental and early-stage exercise, designed
primarily to explore what a grammar-based online CF-metadata checker
might look like. It has been tested only in Firefox and Chrome.
A number of issues have arisen during this exercise:
1. I feel strongly that it is highly desirable to use a standard grammar
notation (such as Prolog's) for representing the grammar. Apart from the
benefits of using a standard approach, this makes it straightforward to
handle arbitrary nesting of grammar rules (as in, say, a grammar for
English), rather than Jonathan's flat set of rules.
2. In my opinion, it is far better for the name itself to contain all
the information about a particular variable, rather than use a separate
mechanism (modifiers). Consider a variable such as
"monthly_mean_of_log_of ratio_of_leaf_carbon_to_root_nitrogen". This is
straightforward to capture in a grammar (provided it can handle the
recursive aspect of the nesting of mathematical functions, which most
could), and almost impossible to capture by the use of modifiers on some
base Standard Name.
3. Prolog is a particularly useful platform to use for this task. It has
long had a specific notation for grammar rules which is very readable
and supported natively by the Prolog interpreter. Using Prolog offers
several substantial benefits over other approaches, including the
ability to handle more advanced grammar requirements, the ability to
query the grammar and/or a collection of Standard Names directly in the
Prolog interpreter, and the ease with which it can made available as a
web app.
4. One feature of Prolog which deserves special mention is that it can
easily be used to automatically generate names which are valid according
to the grammar - this can be done with a one-line query. This may seem
useless, but in fact is a very effective way of picking up weaknesses in
the grammar: if a generated name is (to the expert human) nonsense, then
that can help us to refine the grammar.
5. Jonathan's grammar includes base (atomic) terms which could be
further broken down, for example:
phenomenon -->
due_to_condensation_and_evaporation_from_boundary_layer_mixing
due_to_condensation_and_evaporation_from_convection
due_to_condensation_and_evaporation_from_longwave_heating
due_to_condensation_and_evaporation_from_pressure_change
due_to_condensation_and_evaporation_from_shortwave_heating
due_to_condensation_and_evaporation_from_turbulence
There is clearly scope here for some more rationalisation.
6. The current policy is (as I understand it) that each new Standard
Name has to be approved individually, but that anyone can add whatever
modifiers they like. If, as I suggest, the role of modifiers is
incorporated into the grammar for Standard Names, then this raises
issues about the approval process. I suspect it would be possible for
the parser to detect which names require manual approval and which do
not, according to which rules are fired, but this would need further
research.
7. Ultimately I believe we need to move away from an approved list of
names to an approved grammar for formulating names. However, as Jonathan
has stressed, a grammar is useful - even when a manual approval process
is used - for checking that names conform to agreed style rules.
Cheers,
Robert
On 03/03/11 18:02, Jonathan Gregory wrote:
> Dear Philip and John
>
> I agree with what Philip says here:
>
>> We could then tweak our current practice on this mailing list so that when a
>> person proposes a std_name they should state (or perhaps there is a little
>> bit of code to check) that the proposed std_name conforms to the existing
>> grammar and vocabulary rules. I think most of us would then provide only
>> cursory scrutiny. Perhaps there could even be an automatic timer so that if
>> nobody objects within some time period (perhaps 1 month) then the name is
>> automatically accepted. Essentially the default decision for conforming
>> names would be 'acceptance'. I think this would also make the generation of
>> the text descriptions either automatic, or perhaps obsolete, in many cases
>> because they could be inferred from the grammar and vocabulary tables.
> I could bring the grammar up to date as a starting point. I agree that it
> would be possible to work out text corresponding to each phrase and thus
> construct definitions, or at least a first draft of them. Units could also
> be deduced automatically. I don't myself have the expertise or the time to
> write scripts in support of this, to make it easy for proposers to use these
> procedures e.g. on the web.
>
>
> John writes:
>> But where we are talking about adding generic modifiers, it seems to me a more automated approach is possible. If the meaning of the modifier is clear, then no matter what name it is applied to, the meaning of the resulting compound should be clear. If that is the case, then adding that modifier to an existing name should be verifiable mechanically.
> If this refers to the standard_name modifiers, which are separate words
> appended to standard names, then in fact no approval is needed. It is fine
> to add these to the standard_name attribute. That is not regarded as creating
> a new standard_name. In fact the modifiers were introduced to avoid having to
> add such names to the table.
>
> Best wishes
>
> Jonathan
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Received on Tue Mar 08 2011 - 06:14:09 GMT