Dear Jonathan,
As I mentioned in my response yesterday, we have worked through these issues and think we have a compromise proposal for the community.
Since the conversation is active, I?ll go ahead and share our work with the list in a follow up email in a moment.
A couple specific responses to your note below:
There is a balance to be found between ?opaque? and ?transparent? encoding of geometries (or ragged arrays for that matter). ?Transparent? tends to require more dimensions and really breaks the geometries apart while ?opaque? starts to impinge on the human readability and self-describing ideals of CF. We feel that a middle way is available to us and I?ll outline the logic for that in my follow up.
Regarding the indexed array. The approach (to use node sharing and the indexed array notation) we will propose seems to be good for a couple reasons. First, it allows data to be topologically intact without the need for node-comparison. This is very important for some applications and we feel should be possible in the encoding. Second, it is in-line with the approach taken by UGRID and will be familiar to some for that reason. The argument around storage volume can go either way. You are right that in cases where people don?t have shared nodes, it will be extra. But I?m not convinced this is a factor that would change the decision based on the first two considerations.
Regards,
- Dave
> On Feb 2, 2017, at 3:31 AM, Jonathan Gregory <j.m.gregory at reading.ac.uk> wrote:
>
> Dear Ben and Chris
>
> Following Chris's comment about preferring variables to multi-valued
> attributes, here are the examples for linestring and multipolygon redone so
> that both use variables to store the counts of parts and nodes. In this scheme
> more variables and dimensions are needed, but it may be easier to read and it
> is more CF-like, because the topology information is a "container" variable,
> like the CF grid_mapping and the ugrid mesh topology, having no numerical
> information in itself, just with attributes that point to variables.
>
> dimensions:
> station = 3; // stream segments
> time = UNLIMITED;
> node = 9; // = 2 + 4 + 3
> variables:
> float flow(station,time) ;
> flow:units="m3 s-1";
> flow:topology="SOMETHING";
> double time(time) ;
> time:standard_name = "time";
> time:units = "days since 1970-01-01 00:00:00" ;
> char SOMETHING;
> SOMETHING:node_coordinates="lon lat";
> SOMETHING:node_count="node_count_var";
> SOMETHING:topology_type="linestring";
> int node_count_var(station); // number of nodes for each linestring
> float lon(node) ;
> lon:standard_name = "longitude";
> lon:units = "degrees_east";
> float lat(node) ;
> lat:standard_name = "latitude";
> lat:units = "degrees_north" ;
> data:
> node_count_var=2, 4, 3;
> lon=0, 1, 0, -1, -2, -3, 2, 3, 4;
> lat=51, 52, 51, 50, 50, 49, 55, 55, 56;
>
> dimensions:
> station = 3; // collections of polygons
> time = UNLIMITED;
> node = 24; // = 4 + 3 + 3 + 3 + 5 + 3 + 3
> part = 7 ; // = 3 + 2 + 2
> variables:
> float flow(station,time) ;
> flow:units="m3 s-1";
> flow:topology="SOMETHING";
> double time(time) ;
> time:standard_name = "time";
> time:units = "days since 1970-01-01 00:00:00" ;
> char SOMETHING;
> SOMETHING:node_coordinates="lon lat";
> SOMETHING:node_count="node_count_var";
> SOMETHING:part_count="part_count_var";
> SOMETHING:topology_type="multipolygon";
> int node_count_var(part); // number of nodes in each polygon
> int part_count_var(station); // number of polygons in each collection
> float lon(node) ;
> lon:standard_name = "longitude";
> lon:units = "degrees_east";
> float lat(node) ;
> lat:standard_name = "latitude";
> lat:units = "degrees_north" ;
> data:
> node_count_var=4, 3, 3, 3, 5, 3, 3;
> part_count_var=3, 2, 2;
> lon=0, 20, 20, 0, ... // first polygon, etc. ...
> lat=0, 0, 20, 20, ...
>
> Also, two more thoughts regarding not using indirection, but instead
> duplicating coincident coordinate values:
>
> * Doing it this way (without indexing) is consistent with ordinary CF bounds.
> Contiguous cells in 1D have bounds with equal values. Thus N cells have 2N
> bounds, although usually only N+1 distinct values of bounds. There are several
> reasons why we made this choice, one being that it's more flexible, in allowing
> non-contiguous and overlapping cells.
>
> * The indexing itself takes space. If you have N (lon,lat) points which are all
> boundaries between two regions, so they're all used twice, you will have 4N
> coordinate values without indexing. With indexing you will have only 2N, but
> the index takes N, making 3N in total. Thus you save 25% of the space, not 50%.
>
> Best wishes
>
> Jonathan
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 8141 bytes
Desc: not available
URL: <
http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/d9a38c55/attachment.p7s>
Received on Thu Feb 02 2017 - 06:53:54 GMT