⇐ ⇒

[CF-metadata] Feedback requested on proposed CF Simple Geometries

From: Jonathan Gregory <j.m.gregory>
Date: Thu, 2 Feb 2017 09:31:09 +0000

Dear Ben and Chris

Following Chris's comment about preferring variables to multi-valued
attributes, here are the examples for linestring and multipolygon redone so
that both use variables to store the counts of parts and nodes. In this scheme
more variables and dimensions are needed, but it may be easier to read and it
is more CF-like, because the topology information is a "container" variable,
like the CF grid_mapping and the ugrid mesh topology, having no numerical
information in itself, just with attributes that point to variables.

  dimensions:
    station = 3; // stream segments
    time = UNLIMITED;
    node = 9; // = 2 + 4 + 3
  variables:
    float flow(station,time) ;
      flow:units="m3 s-1";
      flow:topology="SOMETHING";
    double time(time) ;
      time:standard_name = "time";
      time:units = "days since 1970-01-01 00:00:00" ;
    char SOMETHING;
      SOMETHING:node_coordinates="lon lat";
      SOMETHING:node_count="node_count_var";
      SOMETHING:topology_type="linestring";
    int node_count_var(station); // number of nodes for each linestring
    float lon(node) ;
      lon:standard_name = "longitude";
      lon:units = "degrees_east";
    float lat(node) ;
      lat:standard_name = "latitude";
      lat:units = "degrees_north" ;
  data:
    node_count_var=2, 4, 3;
    lon=0, 1, 0, -1, -2, -3, 2, 3, 4;
    lat=51, 52, 51, 50, 50, 49, 55, 55, 56;

  dimensions:
    station = 3; // collections of polygons
    time = UNLIMITED;
    node = 24; // = 4 + 3 + 3 + 3 + 5 + 3 + 3
    part = 7 ; // = 3 + 2 + 2
  variables:
    float flow(station,time) ;
      flow:units="m3 s-1";
      flow:topology="SOMETHING";
    double time(time) ;
      time:standard_name = "time";
      time:units = "days since 1970-01-01 00:00:00" ;
    char SOMETHING;
      SOMETHING:node_coordinates="lon lat";
      SOMETHING:node_count="node_count_var";
      SOMETHING:part_count="part_count_var";
      SOMETHING:topology_type="multipolygon";
    int node_count_var(part); // number of nodes in each polygon
    int part_count_var(station); // number of polygons in each collection
    float lon(node) ;
      lon:standard_name = "longitude";
      lon:units = "degrees_east";
    float lat(node) ;
      lat:standard_name = "latitude";
      lat:units = "degrees_north" ;
  data:
    node_count_var=4, 3, 3, 3, 5, 3, 3;
    part_count_var=3, 2, 2;
    lon=0, 20, 20, 0, ... // first polygon, etc. ...
    lat=0, 0, 20, 20, ...

Also, two more thoughts regarding not using indirection, but instead
duplicating coincident coordinate values:

* Doing it this way (without indexing) is consistent with ordinary CF bounds.
Contiguous cells in 1D have bounds with equal values. Thus N cells have 2N
bounds, although usually only N+1 distinct values of bounds. There are several
reasons why we made this choice, one being that it's more flexible, in allowing
non-contiguous and overlapping cells.

* The indexing itself takes space. If you have N (lon,lat) points which are all
boundaries between two regions, so they're all used twice, you will have 4N
coordinate values without indexing. With indexing you will have only 2N, but
the index takes N, making 3N in total. Thus you save 25% of the space, not 50%.

Best wishes

Jonathan
Received on Thu Feb 02 2017 - 02:31:09 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒