Dear Jonathan,
Having a managed, authoritative list of approved chemical entities including formalised names, common names and accepted abbreviations for inclusion into standard names, ideally with content governance support from an interested group of chemists, would be an extremely valuable asset, not just to CF but to anyone concerned with seamntic interoperability.
Of course, this has already been done by CAS, who have a database of some 32,000,000 substances each with its own identifying key (the CAS number) and set of formal/common labels, as part of the international chemical regulatory infrastructure. Trouble is they charge two bucks a query.
I worry a little about this area of content govenance in CF. Christiane is making valiant efforts to harness the expertise of the scientists in her project through her Wiki but I wonder how many of them have standardised nomenclature at the top of their agenda. Getting nomenclature right requires fully motivated people with the right knowledge and I know only too well how hard these are to find, particularly when there is no financial incentive.
There is already evidence of weakness in our chemical contentent governance for the CF Standard Names. When an e-mail to Christiane whilst trying to map the chemical standard names revealed my ignorance as to the nature of 'hexachloropbiphenyl' I did some research and discovered that this is in fact a term for 42 different PCB cogenors (PCB128-PCB169 - see
http://www.epa.gov/toxteam/pcbid/table.htm) each with a different IUPAC name and CAS number. Whilst I'm not saying it's wrong to have a Standard name covering a group of chemical entities and am not proposing that we revisit this specific example (other than maybe adding a short explanation to the definition), I feel it's something that should have been at least raised for consideration when the standard name was first proposed.
Is there anyone (Philip?) willing to champion the development of such a list and who else believes it would be a useful thing to do? I would be happy to support the serving of the terms assembled, including their semantic interrelationships.
Cheers, Roy.
>>> Jonathan Gregory <j.m.gregory at reading.ac.uk> 8/1/2007 2:23 pm >>>
Dear Philip
> 1) One of the current problems with chemistry output is that there is
> currently NO agreed upon list of species names: many species have various
> common names, and those names are often abbreviated in codes, which can
> lead to confusion and incompatible files (eg ACET could refer to acetone
> or acetaldehyde)
>
> It would be really good if an agreed upon list of names could be agreed
> upon bf CF.
I think that we are doing this, in effect, in CF, because we would always use
the same name for a given species in the standard name table. Of course we have
to be careful to choose the species names in a reasonable way. Christiane has
thought about this e.g. when a common name can be used instead of a IUPAC name.
We would not decide in advance a complete set of species names, however; as
usual, we add them as the need arises. Use of codes and abbreviations would not
be consistent with the self-describing intentions of CF, so I don't think we
ought to do that.
> 2) The internal structure containing chemicals in models does NOT have to
> be the same as that used in the NetCDF output, and my preference is to
> have them be different.
...
> If the
> netCDF structure just uses a dimension for species, it is a pain to create
> new input files: I often find myself needing to add or remove species, or
> scale some of the species. For analysis, I also usually find myself
> wanting to extract a long time history of just one or two species (out of
> a hundred), which is also a real nuisance with a single array for all
> species, and the process is prone to error.
> nuisance that if a single specie is extracted, the degenerate dimension
> needs to be handled.
>
> Lastly, it is usually sufficient to output a small subset of species
> (saving a lot of file-space), in which case the output array will not
> match the internal array anyway.
...
> I have used codes that handled this situation both ways. Personally, I
> prefer to have each specie separate in the netCDF: the ease and accuracy
> of post-processing easily outweighs the disadvantages.
Those are interesting points. There is no reason why CF should not support
both approaches. At the moment we are following your preferred approach, which
means giving standard names with the species names in them. That means the
standard name table is larger, but I don't think it matters.
Cheers
Jonathan
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.
Received on Thu Aug 02 2007 - 03:24:38 BST