[CF-metadata] Feedback requested on proposed CF Simple Geometries from Chris Barker on 2016-11-01 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Chris Barker <chris.barker>
Date: Tue, 1 Nov 2016 09:43:01 -0700

A few comments, though you all seem to have this in hand :-)

I was asking whether this means that for each *collection* (of points,
> lines or
> polygons) there is a *single* timeseries.

I don't get why this matters -- any number of time series could be
associated with a single "entity" -- just like any number of timeseries can
be associated with given coordinates in regular old CF.

> For instance, in your example of a
> single geometry composed of several polygons, there is a single number for
> each
> time. But that is not the case for weather stations; for each weather
> station
> there is a timeseries, and at each time there is a different number (value
> of
> temperature, precipitation or whatever) for each weather station.

I think it may be helpful to borrow terminology (and the data model) from
the GIS world here. IN this case, I am referencing the geoJSON spec, as I
happen to be working with that at the moment, but the basic data model is
pretty consistent.

http://geojson.org/geojson-spec.html

Note that they have "geometries" which can be things like points, polygons,
polyllines. IIUC (and I'm no osgeo mavin) geometries represent a "single"
entity. Then there are "Features": a Feature is essentially data associated
with a particular geometry.

But note: there are "Collections" -- both Geometry and Feature Collections
-- that is what you use to "bundle" various data together.

I think we may be well served by thinking in terms of mapping the GIS data
model to CF/netcdf -- for instance it would be great to be able to write a
netcdf<->geoJSON converter that was lossless, AND would be fairly "native"
in both cases.

You also
> write, "The US National Weather Service?s National Water Model (NWM) ...
> forecasts streamflow rates in about 2.7 million stream segments averaging
> 2km."
> The stream network is a MultiLineString geometry, but I don't think there
> is
> just one value of streamflow applying to the entire network at any given
> time;
>

no -- of course not. So that network (if I understand the GIS data model)
should be a Feature Collection, not all one Feature. So a whole collection
of geometries as well.

The "trick" with this data model is that it "de-vecoritizes" the data.
Those of us used to working with netcdf, CF, gridded data, etc, tend to
think that you'd want to have, for instance, a vector of geometries, and
then various vectors of data associated with those geometries. whereas the
GIS data model associated data with a given geometry, and then creates
collections of those. This is kindof like the old C conundrum:

Do a use a struct of arrays, or an array of structs? netcdf is very much
about the struct of arrays approach.

(though I'm still confused, maybe you can have an "array" of data
associated with a GeometryCollection?)

as for MultiLineString -- you could associate an array of data with the
Multilinestring -- so one value per segment. But I think that violates the
intent of the data model -- you should have a GeometryCollection of
linestrings instead. and then each segment has its own geometry and you can
associate an array of data with that. (or it should be a FeatureCollection?
I'm getting confused now!

I guess there is a different timeseries for each stream segment. But in my
> example above, the Atlantic Ocean is a single polygon with a single
> timeseries
> for its average temperature, not a different timeseries for each node.

right, so that Polygon would be a single Feature.

> Thus I
> am unclear about the dimensions of the data. In terms of your original
> example,
> does the data have dimensions (time,geometry, where geometry=1) or
> (time,node)?
>

(time,geometry, where geometry=1)

time,node would be for data associated with a FeatureCollection of Points
(or a MultiPoint).

Does anyone "get" the GIS data model. I'm quite confused as to when you
would use:

MultiPolygon
vs
GeometryCollection of Polygons
vs
FeatureCollection of Features with Polygon Geometries

But I'm going t take a stab at it:

MultiPolygon (and MultiLInestring, and MultiPoint) is used when you have
more than one of a particular type of geometry that are logically one thing
-- maybe an archipelago, for instance. A Polygon geometry can represent a
simple polygon, or a polygon with holes in it -- but can not represent two
separate polygons. So if you have multiple polygons that are geometrically
distinct, but logically connected, you use a MultiPolygon.

I'm on shakier ground about when you want to use a GeometryCollection vs a
FeatureCollection, but I _think_ that the point of a geometrycollection is
that you can group different types of geometry -- but still want them to be
treated as a single entity.

I've dealt with all this trying to jam data that fits well into netcdf into
geoJSON, or GIS_oriented systems -- it's quite hard to be efficient about
it :-) - i.e there is really no way to associate an array of data with an
array of geometries -- it sure looks like you could do it with
GeometryCollections, but the systems aren't expecting that.

Of course, CF doesn't need to follow this data model, but it's a good idea
to be informed by it.

> Nonetheless in both cases the geometries have to be described. I think the
> difference is how we attach this description to the data or coordinates,
> rather
> than how the description is constructed.
>

indeed.

> You propose the index variable in order for the convention to be like
> ugrid.
> However this still seems to me to be an unnecessary complexity and use of
> space
> if you aren't going to have many shared nodes.

In the GIS data model, nodes are not shared between geometries, and you are
quite right that keeping nodes separate with geometries indexing nto it is
an added complication and would not be space-efficient.

However, there is another reason to do it -- it makes it definitive that
two (or more) geometries share the exact same node, rather than them being
distinct points that happened to be at the same location (Or worse, with FP
error and all, two points that are very close)e

This is actually a major limitation in the standard GIS model.

> I think the case for having
> another convention, distinct from ugrid, is stronger if it is *unlike*
> ugrid
> in this respect, and therefore simpler as well.
>

I still think that it should be separate from UGRID -- it really is a
different use case, though they should still share whatever they can, and
it could turn out that UGRID is a special case of geometries?

-CHB

-- 
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20161101/f47c60b8/attachment-0001.html>

Received on Tue Nov 01 2016 - 10:43:01 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST