⇐ ⇒

[CF-metadata] Seeking example program for storing surface obs in CF?convention

From: Jonathan Gregory <j.m.gregory>
Date: Wed, 8 Aug 2007 09:06:51 +0100

Dear John

> My own opinion is that CF is not currently adequate for writing observational data to NetCDF. The basic limitation in section 5.4 is that
>
> float humidity(time,pressure,station)
> float pressure(pressure);
> double time(time);
>
> requires the same number and values of the time and pressure coordinates at each station.

Yes, this is wasteful of space if you make all the stations share the
coordinate variables but they don't all have info at all (time,pressure)
points. Alternatively you have to create separate coordinate variables for
each station, which may be inconvenient.

If we put them in common variables, if I have understood your proposal, I
prefer the contiguous arrangement, something like this:

dimensions:
  record=UNLIMITED;
  station=5;
  stringlen=20;
variables:
  char station_name(station,stringlen);
  float latitude(station);
  float longitude(station);
  double time(record);
  float humidity(record);
    humidity:coordinates="time";
  float temperature(record);
    temperature:coordinates="time";

where the individual stations are contiguous in the humidity and temperature
variables. Then the question is how to indicate the range of records which
belongs to each station. One way, as in your example, is to provide an array
of start or end pointers into the records. Another way, which takes up a bit
more space but could be more convenient for using the data, would be to include

  int whichstation(record);
    whichstation:coordinate_index="station";

where the presence of the coordinate_index attribute indicates that the value
of whichstation is an index into the station coordinate dimension. whichstation
could be identified an an auxiliary coordinate variable by naming it in the
coordinates attribute:

  float humidity(record);
    humidity:coordinates="time whichstation";

E.g. if you have two timeseries, one with temperature data (1.1, 1.2, 1.3) and
the other with data (2.1, 2.2), you would have:

data:
  temperature=1.1, 1.2, 1.3, 2.1, 2.2;
  whichstation=0, 0, 0, 1, 1;

If it is done this way, rather than with start pointers, the individual
timeseries actually do not have to be stored contiguously, so any of them can
be appended to at any time. That might be a useful feature.

Your proposal appears to me to introduce several extra features which are
redundant or duplicating other CF attributes. The _CoordinateAxisType attr
has the same function as the CF axis attribute. I don't see the need for the
global attributes latitude_coordinate etc. since the lat etc. coordinates can
be identified by units and by standard_name; also, having a *global* attr
restricts the file to having only *one* coord variable of each type. The
attributes giving the max and min of each of the coordinates contain info
which can be deduced from the coord variables themselves, of course; is that
an important kind of discovery metadata? I'd be worried about it because it
is almost certain to be wrong some of the time i.e. inconsistent with the
coord variables. The cdm_datatype attribute implies a distinction between
various kinds of data which are formally not really different and would be
processed in the same way, so I don't see why this is useful.

Best wishes

Jonathan
Received on Wed Aug 08 2007 - 02:06:51 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒