⇐ ⇒

[CF-metadata] Feedback requested on proposed CF Simple Geometries

From: Ben Koziol - NOAA Affiliate <ben.koziol>
Date: Wed, 7 Sep 2016 12:13:32 -0600

Greetings,


As part of an EarthCube project for advancing netCDF-CF [1], we are
developing an approach to represent simple geometries in enhanced netCDF-4
with a variable length array backport for netCDF-3. Simple geometries, for
example, may be used to associate stream discharge with river lines or
surface runoff with watershed polygons. We've drafted an initial approach
and reference implementation on the GitHub netCDF-CF-simple-geometry
project [2] and would greatly appreciate feedback from the CF community.
We'd like to make sure our scope is appropriate and our approach is
acceptable.

Scope


   -

   The result of this effort will be a standard that the CF timeSeries
   feature type could use to specify spatial coordinates (define a simple
   geometry) for a timeSeries variable.
   -

   For those familiar with the OGC WKT standard geometry types [3], we will
   include Point, LineString, Polygon, Multipoint, MultiLineString, and
   MultiPolygon (WKT primitives and multipart geometries).


We anticipate that the six chosen geometry types will cover the needs of
most people generating netCDF data. These types also align with other
geospatial data formats such as GeoJSON and ESRI Shapefile. If our approach
is well received by the CF community, we may later adapt it to include
parametric shapes such as circles and ellipses.

Simple Geometry Encoding Method

Driven by the possibility that different features will require different
numbers of coordinates to describe their geometries, our approach uses
variable length (VLEN) arrays in enhanced netCDF-4 and continuous ragged
arrays (CRAs) in netCDF-3. We describe the VLEN netCDF-4 approach first.
The netCDF-3 CRA description follows.

In our approach, a VLEN coordinate_index variable which identifies the
indices of geometry coordinates in separate coordinate arrays. The
coordinate_index variable includes a coordinates attribute which stores the
names of the coordinate variables and a geom_type attribute to indicate the
geometry type.

For multipart geometries, the coordinate index variable may include a
negative integer flag(s) indicating the start of each new geometry "part"
for the current feature. The first geometry part is not preceded by the
negative integer flag. The variable shall include an attribute named
multipart_break_value identifying the flag's value.

For polygon geometries with holes (also called "interiors"), the coordinate
index values shall include a negative integer flagging the start of each
hole. In this case, the variable shall include a hole_break_value attribute
to indicate the flag value.

Other attributes on the coordinate index variable describe clockwise or
anticlockwise node order for polygons and polygon closure convention. For
additional details, see the wiki [4]. With these concepts defined, an
example for multipolygons with holes is shown below. You can copy the WKT
description below into Wicket [5] if you'd like to see what the geometry in
this example looks like.

Well-Known Text (WKT): MULTIPOLYGON(((0 0, 20 0, 20 20, 0 20, 0 0), (1 1,
10 5, 19 1, 1 1), (5 15, 7 19, 9 15, 5 15), (11 15, 13 19, 15 15, 11 15)),
((5 25, 9 25, 7 29, 5 25)), ((11 25, 15 25, 13 29, 11 25)))

Common Data Language (CDL) for netCDF-4 VLEN Arrays:

netcdf multipolygon_example {

types:

 int64(*) geom_VLType ;

dimensions:

 node = 25 ;

 geom = 1 ;

variables:

 geom_VLType coordinate_index(geom) ;

   string coordinate_index:geom_type = "multipolygon" ;

   string coordinate_index:coordinates = "x y" ;

   coordinate_index:multipart_break_value = -1 ;

   coordinate_index:hole_break_value = -2 ;

   string coordinate_index:outer_ring_order = "anticlockwise" ;

   string coordinate_index:closure_convention = "last_node_equals_first" ;

 double x(node) ;

 double y(node) ;

data:

coordinate_index =

   {0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12, -2, 13, 14, 15, 16,
-1, 17, 18, 19, 20, -1, 21, 22, 23, 24} ;

x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5,
11, 15, 13, 11 ;

y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25,
29, 25, 25, 25, 29, 25 ;

}

You'll find additional examples for VLEN geometry storage on our wiki [6].

Variable Length (VLEN) Arrays in NetCDF-3

To support netCDF-3, we created a VLEN approach for netCDF-3 [7]. Inspired
by CF continuous ragged arrays (CRAs), our approach drops the CRA count
variable in favor of a stop variable that stores the stop index for each
geometry within an array of geometry coordinates. This improves random
accessibility of the CRA "elements" avoiding the need to sum counts
preceding the target element index. The stop variable includes a
contiguous_ragged_dimension attribute whose value is the name of the
dimension for which stop indices apply (similar to the CRA sample_dimension
attribute). An example showing how strings can be stored with this approach
is shown below.

Common Data Language (CDL) for netCDF-3 CRAs:

netcdf dwarf_planets {

dimensions:

dwarf_planet = 5 ; // number of dwarf planets described in this file

dwarf_planet_chars = 28 ; // total number of characters for all planet
names

variables:

char dwarf_planet_name(dwarf_planet_chars) ;

int dwarf_planet_name_stop(dwarf_planet) ;

dwarf_planet_name_stop:contiguous_ragged_dimension = "dwarf_planet_chars" ;

data:

dwarf_planet_name = "PlutoCeresErisHaumeaMakemake" ;

dwarf_planet_name_stop = 5, 10, 14, 20, 28 ;

}

For the above geometry example, the VLEN coordinate_index netCDF-4 array is
replaced by a netCDF-3 CRA.

netcdf multipolygon_example {

dimensions:

   node = 25 ;

   indices = 30;

   geom = 1 ;

variables:

 int coordinate_index(indices) ;

     coordinate_index:geom_type = "multipolygon" ;

     coordinate_index:coordinates = "x y" ;

     coordinate_index:multipart_break_value = -1 ;

     coordinate_index:hole_break_value = -2 ;

     coordinate_index:outer_ring_order = "anticlockwise" ;

     coordinate_index:closure_convention = "last_node_equals_first" ;

 int coordinate_index_stop(geom) ;

     coordinate_index_stop:contiguous_ragged_dimension = "indices" ;

 double x(node) ;

 double y(node) ;

data:

coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12, -2,
13, 14, 15, 16, -1, 17, 18, 19, 20, -1, 21, 22, 23, 24 ;

coordinate_index_stop = 30 ;

x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5,
11, 15, 13, 11 ;

y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25,
29, 25, 25, 25, 29, 25 ;

}

The CRA method could of course be used in place of VLEN in netCDF-4. See
our wiki page on GitHub [7] for more details and examples.

Questions for the CF Community


   1.

   Are our VLEN netCDF-3 and netCDF-4 approaches acceptable? What changes
   would you recommend?
   2.

   Are the geometry types point, line, polygon, and their multipart
   equivalents sufficient for the community?


Thank you very much for considering our ideas and helping us with your
valuable feedback!

[1] http://earthcube.org/group/advancing-netcdf-cf

[2] https://github.com/bekozi/netCDF-CF-simple-geometry

[3] https://en.wikipedia.org/wiki/Well-known_text

[4] https://github.com/bekozi/netCDF-CF-simple-geometry/wiki

[5] https://arthur-e.github.io/Wicket/sandbox-gmaps3.html

[6]
https://github.com/bekozi/netCDF-CF-simple-geometry/wiki/Examples---VLen-Ragged-Arrays
[7]
https://github.com/bekozi/netCDF-CF-simple-geometry/wiki/VLEN-Arrays-in-NetCDF-3

-- 
Ben Koziol
NESII/CIRES/NOAA Earth System Research Laboratory
ben.koziol at noaa.gov
<https://mail.google.com/mail/u/0/?view=cm&fs=1&tf=1&to=ben.koziol at noaa.gov>
802.392.4522
http://www.esrl.noaa.gov/nesii/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20160907/11ef540f/attachment-0001.html>
Received on Wed Sep 07 2016 - 12:13:32 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒