⇐ ⇒

[CF-metadata] FW: Common names for chemical species

From: Pascoe, S <S.Pascoe>
Date: Mon, 6 Aug 2007 11:45:56 +0100

 
This should have gone to the list

---
Stephen Pascoe  +44 (0)1235 445980
British Atmospheric Data Centre
Rutherford Appleton Laboratory
-----Original Message-----
From: Pascoe, S (Stephen) 
Sent: 06 August 2007 11:32
To: 'Schultz, Martin'
Subject: RE: [CF-metadata] Common names for chemical species
 
Apologies for not contributing further to this discussion after helping keep it alive.  Martin and others are articulating the problem much better than I could.
As he says, the MCM represents one extreme of the problem -- nearly explicit chemistry and therefore thousands of species.  Our species naming problem isn't typical of atmospheric chemistry.  However, I hope it's worth sharing how we are tackling the naming problem.
In the MCM/IUPAC database integration project (http://www.iupac-kinetic.ch.cam.ac.uk/kt_project.html) we are investigating various ways of identifying species both internally and to the user.  We are using IUPAC International Chemical Identifiers (InChI, http://www.iupac.org/inchi/) as our shared species identifiers.  These are strings systematically generated from the structural formula.  We also use SMILES (http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html) as a linear structure notation.
Both these technologies are geared towards organic chemistry and therefore don't uniquely identify all our species: for instance certain excited states.  There is also no way of representing generic species like NOx or lumped species.
CAS numbers are the industry standard but in the brave new world of Web 2.0 there is an alternative emerging.  PubChem (http://pubchem.ncbi.nlm.nih.gov/) is a freely available database of compounds that provide their own identifiers.  We may use this service to map InChIs onto IUPAC and common names.  An increasing number of websites are using PubChem CIDs as their primary identifiers (www.emolecules.com).  The driver for this is, of course, cost.  CAS is pricing itself out.
So there are a variety of technologies and services available to help us define our species in a systematic, machine interpretable way.  
Returning back to the topic of standard_names, what most people want is a human readable name and compound/species is just one part of the measurement concepts CF encapsulates in a standard_name.  I believe ontologies could provide the semantic glue between all the different components of a name (for instance, as described in http://cf-pcmdi.llnl.gov/documents/cf-standard-names/guidelines).
In my view Roy's ontological approach is the only way of making the standard_name system scale properly.
Cheers,
Stephen.
---
Stephen Pascoe  +44 (0)1235 445980
British Atmospheric Data Centre
Rutherford Appleton Laboratory
-----Original Message-----
From: cf-metadata-bounces at cgd.ucar.edu [mailto:cf-metadata-bounces at cgd.ucar.edu] On Behalf Of Schultz, Martin
Sent: 06 August 2007 09:40
To: cf-metadata at cgd.ucar.edu
Subject: Re: [CF-metadata] Common names for chemical species
 
Hi,
   first of all, I am very pleased to see this topic getting more and more attention in this newsgroup and also in the community of (atmospheric) chemistry modellers. It is good to have people like Christiane and Philip involved in this and I hope that we will be more in the future when more people realize the advantages of some standards for exchanging and assessing multiple output.
   Concerning the list of chemical names, I don't think that CAS or any list with similar level of detail is the way forward. While most models treat a certain number of species (typically 40-60) explicitely, they always group organic species into compound classes in order to make the problem solvable (you just cannot imagine a global or regional 3D model with several million chemical variables). Yes - there are box models which go a certain way to treat individual compounds and radical species (for example the Master Chemical Mechanism in the UK or NCAR's master mechanism), but even these don't resolve everything after a certain point. 
   Luckily, the community is usually interested in a small subset of species only. Presumably the list compiled by Christiane covers quite a bit of that already. However, it is unavoidable to also consider "lumped" compounds in for example intercomparison studies. Then the difficulty is to find an unambiguous definition which nevertheless encompasses a sufficiently large number of models so that the "standard" is not only a private convention which could even interfere with other similar definitions. Let me give two examples here:
(1) pentanes: chemically, one distinguishes between n-pentane and i-pentane (one is linear the other molecule "T"-shaped). Both react through different pathways and form different compounds in the end. Nevertheless, the two molecules are practically always grouped together in modelling applications and the different reaction pathways are described via stochiometric factors
       pentane + OH -> 0.7 * X + 0.3 * Y + 0.235 * Z + ...
    In this example, defining a standard name for pentane would pose no major problem, I suppose.
(2) alkanes: Many models treat alkanes explicitly up to a certain C-number (usually 3, 4, or 5) and then lump everything else together. Practically, one could of course define standard names such as 
    mixing_ratio_of_alkanes_with_at_least_three_c_atoms_in_air
    mixing_ratio_of_alkanes_with_at_least_four_c_atoms_in_air
    mixing_ratio_of_alkanes_with_at_least_five_c_atoms_in_air    
    mixing_ratio_of_alkanes_with_at_least_six_c_atoms_in_air
    ...
However, the rule to make use of these would be quite complex. Say, you want to compute the total amount of hydrocarbons in an air mass, then this would be either
    ethane + alkanes(C3) + "aromatics" + ...
or
    ethane + propane + alkanes(C4) + "aromatics" + ...
or
    ethane + propane + butane + alkanes(C5) + "aromatics" + ...
...
And the same complications (if not worse) appear with unsaturated species, oxygenated species, etc.
I don't know an easy way out of this, but it is good that these things are discussed here and now.
Best regards,
Martin
< Dr. Martin G. Schultz, ICG-II, Forschungszentrum J?lich >
< D-52425 J?lich, Germany                                 >
< ph: +49 (0)2461 61 2831, fax: +49 (0)2461 61 8131       >
< email: m.schultz at fz-juelich.de                          >
< web: http://www.fz-juelich.de/icg/icg-2/m_schultz      >
Forschungszentrum J?lich GmbH
52425 J?lich
Sitz der Gesellschaft: J?lich
Eingetragen im Handelsregister des Amtsgerichts D?ren Nr. HR B 3498 Vorsitzende des Aufsichtsrats: MinDirig'in B?rbel Brumme-Bothe
Vorstand: Prof. Dr. Achim Bachem (Vorsitzender), Dr. Ulrich Krafft (stellv. 
Vorsitzender)
Received on Mon Aug 06 2007 - 04:45:56 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒