⇐ ⇒

[CF-metadata] Taxa in CF. Some questions

From: Lowry, Roy K. <rkl>
Date: Tue, 2 Apr 2013 20:00:22 +0100

Hello Jonathan,

The reason that I used MAXT rather than TIME is that I am trying to follow the point data conventions with the possibility of multiple time series (along the INSTANCE dimension) of different lengths stored as padded rather than ragged arrays in a single file. Your example is restricted to a single time series, which might be a better idea for the example in the CF documentation as it has less confusing distractions.

I'm afraid that biology is a less precise domain than physics. However. compared to what we had when I started with biological data management 20-odd years ago, resources like WoRMS and ITIS are a massive step forward. I've been searching for something conforming to your expectations for 20 years, but have come to the conclusion it's an impossible dream as it'sorthogonal to the bioscience paradigm!

What WoRMS/ITIS have delivered are unique and reliable identifiers for taxa, but these are not self-describing - the TSN and aphiaID are in fact integers. Using these IDs both circumvents the homonym issue (which can be infuriating: I have had many battles with a marine coral misidentified as a South American centipede because they both have the same species name) and provides a defence against the habit biologist have of changing the taxon names for a given entity over time. They have also done a lot to standardise the spelling of taxon names, particularly issues such as discrepencies in Latin word endings (e.g. forestii versus foresti). I cannot see any alternative to imcluding taxon_names and taxon_identifiers in parallel and I'm relieved that you are reasonably comfortable with the idea.

Cheers, Roy.
________________________________________
From: CF-metadata [cf-metadata-bounces at cgd.ucar.edu] On Behalf Of Jonathan Gregory [j.m.gregory at reading.ac.uk]
Sent: 02 April 2013 17:38
To: cf-metadata at cgd.ucar.edu
Subject: [CF-metadata] Taxa in CF. Some questions

Dear Roy

Yes, I think you are right that it is useful to have the taxon as a dimension
because it allows you to put several of them in one variable, provided it's
the same quantity, with the same generic standard name. That is just like
bundling up timeseries from different locations into one data variable. This
kind of dimension is called a "discrete axis" in CF 1.6, section 4.5. By
"container variable" CF so far means something different: that's an empty
data variable which exists to hang attributes from, to specify grid_mappings.

I assume that MAXT is the size of the time dimension, isn't it? Could we write
your example like this:

dimensions;
  time=1000;
  string80=80;
  taxon=2;
variables:
  float abundance(time,taxon);
    abundance:standard_name="number_concentration_of_taxon_in_sea_water";
    abundance:coordinates="taxon_identifier taxon_name";
  char taxon_name(taxon,string80);
    taxon_name:standard_name="taxon_name";
  char taxon_identifier(taxon,string80);
    taxon_name:standard_name="taxon_identifier";

I am not sure if I've understood your example, though. Yes, I think both the
taxon descriptions should be string-valued auxiliary coordinate variables, as
I have shown them (CF section 6.1).

If there is only one taxon, the taxon dimension could be omitted.

However, I am a bit disturbed to learn that the taxon_name might not be
reliable or unique. If CF is going to depend on an external vocabulary, I would
argue that it needs one which provides unique and reliable self-describing
identifiers.

Best wishes

Jonathan
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.
Received on Tue Apr 02 2013 - 13:00:22 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒