⇐ ⇒

[CF-metadata] Feedback requested on proposed CF Simple Geometries

From: Jonathan Gregory <j.m.gregory>
Date: Thu, 27 Oct 2016 09:41:33 +0100

Dear Dave

> I?ll respond the first question by saying that we are talking about
> (time,geometry) NOT (time,node).

Good. Thanks for the clarification.

> You are correct in thinking that this is analogous to a complex (potentially
> multipart) cell. In this case, we feel that it is more analogous to a
> different spatial representation of a station DSG data type than an extension
> of cell geometry, but different use cases may have differing a-priori
> relationships to the existing standards baseline.

Timeseries have data(time,station), which is like your case. When all stations
have the same sampling times, this is just a 2D array and the station dimension
is a "discrete axis" (CF sect 4.5). This existed before sect 9 on DSGs was
added to CF. That new section provides a mechanism for storing ragged data
arrays for multiple timeseries without wasting space, but logically it is
equivalent to the rectangular array, which it upholds as a possible way to
store the data. So I think actually the two cases you mention are the same.
The new thing you want to do is describe the "station" for each timeseries in
a geometrically complex way, rather than its being a single point or a single
polygon, which can already be described by coordinates and coordinate bounds,
respectively.

> DSG handles data from one or a collection of TimeSeries (point), Trajectory,
> Profile, TrajectoryProfile or TimeSeriesProfile. So measurements are from a
> point (TimeSeries and Profile) or points along a trajectory. DSG can be used
> for at least some of what you want to do here if you say, e.g., I have a
> TimeSeries which is stream flow measured (or modeled) at a given point on a
> stream. But DSG has no system to define a geometry (point, polyline, polygon,
> etc) and say e.g, The expected rainfall in this polygon for some period of
> time is 5304 liters, except to assign a nominal point (centroid?) or just use
> an ID (e.g., San Francisquito Catchment section 5A).

I suggest that if we see it this way you would still provide representative
coordinates for each station, whether you stored it as a rectangular array or a
ragged one. These could be useful for the simplest kind of plotting, which
wants a location for each "geometry". However, these coordinates would not have
bounds, because that's not adequate to describe the structure.

There are many choices for how the geometry data could be stored. Here's a
suggestion along the lines of my last email. I have made it resemble ugrid (and
grid_mapping) in using a "container" variable to "host" the geometry
description. In this way of doing it, each data variable points to the geometry
variable, and the geometry variable points to the geometry coordinates, with no
direct link between the representative (geometry) coordinates and the geometry
(node) coordinates. It could alternatively, or additionally, be arranged so
that the representative coordinates point to the geometry coordinates.

However, my suggestion below is unlike other uses of container variables, which
are scalars that don't contain information. My geometry variable is an
auxiliary coordinate variable, pointed to by the usual CF coordinates
attribute, and its value contains information - it gives the number of parts in
each geometry. It is identifiable as a geometry variable by its special
attributes. Making it formally an aux coord variable avoids having to invent a
new attribute to point to it. The "inout" variable contains I or O for inside
or outside polygon; it could also contain L for line and P for point.

  geom=3;
  part=11;
  node=36;
  time=20;
  float p(time,geom);
    p:standard_name="precipitation_flux";
    p:units="kg m-2 s-1";
    p:coordinates="xrep yrep geom3";
  float t(time,geom);
    t:standard_name="air_temperature";
    t:units="K";
    t:coordinates="xrep yrep geom3 z";
  float z;
    z:standard_name="height";
    z:units="m";
  int geom3(geom);
    geom3:part_dimension="part"; // must equal the sum of geom3
    geom3:node_count="nodes_per_part";
    geom3:part_type="inout";
    geom3:node_dimension="node"; // must equal the sum of the node_count
    geom3:node_coordinates="x y"; // also an attribute in ugrid
  float xrep(geom);
  float yrep(geom);
  int geom3(geom);
  int nodes_per_part(part);
  char inout(part);
  float x(node);
  float y(node);
  geom=6, 3, 2;
  nodes_per_part=4, 3, 3, 3, 3, 3, 3, 5, 3, 3, 3;
  inout="OIIIOOOIO";
  x=0, 20, 20, 0, 1, 10, 19, 5, 7, 9, 11, 13, 15, 5, 9, 7, 11, 15, 13, -40,
  -20, -45, -20, -10, -10, -30, -45, -30, -20, -20, 30, 45, 10, 25, 50, 30;
  y = 0, 0, 20, 20, 1, 5, 1, 15, 19, 15, 15, 19, 15, 25, 25, 29, 25, 25, 29,
  -40, -45, -30, -35, -30, -10, -5, -20, -20, -15, -25, 20, 40, 40, 5, 10, 15;
  z = 1.5;

I agree you can't use ugrid as it stands, because ugrid describes a single
mesh per data variable, with many data "points" at nodes, edges and faces of
the mesh. You want several meshes (in ugrid terms) for each data variable,
with only one data "point" for each mesh. It would be nice not have a
separate convention, but that can only be avoided by adapting ugrid, it seems.
You also can't use DGS of sect 9 as they are, because they cannot accommodate
a ragged array of ragged arrays.

Best wishes

Jonathan
Received on Thu Oct 27 2016 - 02:41:33 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒