⇐ ⇒

[CF-metadata] Extension of Discrete Sampling Geometries for Simple Features

From: David Blodgett <dblodgett>
Date: Wed, 22 Feb 2017 09:23:53 -0600

We will meet on google hangouts at 8am CT on March 7th If you?d like to be added to the calendar invite, please let me know.

NetCDF - Simple Geometries Discussion
Scheduled: Mar 7, 2017, 8:00 AM to 9:00 AM CT
Hopefully google hangouts will work. Please use this url:
https://plus.google.com/hangouts/_/calendar/ZGJsb2RnZXR0Lmgyb0BnbWFpbC5jb20.c192p3bskticchduvubks6ug44 <https://plus.google.com/hangouts/_/calendar/ZGJsb2RnZXR0Lmgyb0BnbWFpbC5jb20.c192p3bskticchduvubks6ug44>

- Dave

> On Feb 18, 2017, at 4:17 PM, Blodgett, David <dblodgett at usgs.gov> wrote:
>
> Let's try this again. Sorry to spam everyone.
>
> http://doodle.com/poll/yaherucx2w3cd9y6 <http://doodle.com/poll/yaherucx2w3cd9y6>
>
>
> On Feb 6, 2017, at 11:29 AM, David Blodgett <dblodgett at usgs.gov <mailto:dblodgett at usgs.gov>> wrote:
>
>
>
>
> Dear CF,
>
>
>
>
> I want to follow up on the conversation here with an alternative approach suggested off list primarily between Jonathan and I. For this, I?m going to focus on use cases satisfied and simplification of the proposal allowed by not supporting those use cases. The changes below are largely driven by a desire to better align this proposal with the technical details of the prior art that is CF.
>
>
>
>
> If we:
>
>
> 1) don?t support node sharing, we can remove the complication of node - coordinate indexing / indirection, simplifying the proposal pretty significantly.
>
>
> 2) don?t use ?break values? to indicate the separation between multi-part geometries and polygon holes, we end up with a data model with an extra dimension, but the NetCDF dimensions align with the natural dimensions of the data.
>
>
> 3) use ?count? instead of a ?start pointer? approach, we are better aligned with the existing DSG contiguous ragged array approach.
>
>
>
>
> Coming back to the three directions we could take this proposal from my cover letter on February 2nd.
>
>
> 1. Direct use of Well-Known Text (WKT). In this approach, well known text strings would be encoded using character arrays following a contiguous ragged array approach to index the character array by geometry (or instance in DSG parlance).
>
>
>
> 2. Implement the WKT approach using a NetCDF binary array. In this approach, well known text separators (brackets, commas and spaces) for multipoint, multiline, multipolygon, and polygon holes, would be encoded as break type separator values like -1 for multiparts and -2 for holes.
>
>
>
> 3. Implement the fundamental dimensions of geometry data in NetCDF. In this approach, additional dimensions and variables along those dimensions would be introduced to represent geometries, geometry parts, geometry nodes, and unique (potentially shared) coordinate locations for nodes to reference.
>
>
>
> The alternative I?m outlining here moves in the direction of 3. We had originally discounted it because it becomes very verbose and seems overly complicated if support for coordinate sharing is a requirement. If the three simplifications described above are used, then the third approach seems more tenable.
>
>
>
>
> Jonathan has also suggested that: (these are in reaction to the CDL in my letter from February 2nd)
>
>
> 1) Rename geom_coordinates as node_coordinates, for consistency with UGRID.
>
>
> 2) Omit node_dimension. This is redundant, since the dimension can be found by
>
>
> examining the node coordinate variables.
>
>
> 3) Prescribe numerous ?codes? and assumptions in the specification instead of letting them be described with attribute values.
>
>
> 4) It would be more consistent with CF and UGRID to use a single container variable to hang all the topology/geometry information from.
>
>
>
>
> Which I, personally, am happy to accept if others don?t object.
>
>
>
>
> A couple other suggestions from Jonathan I want to discuss a bit more:
>
>
> 1) Rename geometry as topology and geom_type as topology_type.
>
>
> While I?d be open to something other than geom, topology is odd. If this is really ?node_collection_topology_type? I guess I could be convinced, but would be curious how people react to this. (Especially in relation to UGRID)
>
>
> 2) This extension is more appropriate as an extension to the concept of cell bounds than the addition of a complex time-invariate type of discrete sampling geometry.
>
>
> Having just re-read the cell bounds chapter, I think it would over complicate the cell bounds to include this material. My basic issue here is that these geometries do not necessarily have a reference location. They are, rather, first order entities that need to be treated as such. That said, it makes sense that these geometries are not necessarily a good fit for the original intent of Discrete Sampling Geometries. Jonathan suggested they may belong in their own chapter, which may be a good alternative? MY suggested CDL below might lead us in the direction of this being a special type of auxiliary coordinate variable.
>
>
>
>
> This alternative starts to look like the CDL pasted below.
>
>
>
>
> Note that the issue of coordinates is sticking out like a sore thumb. Below, I?ve attempted to reconcile Jonathan?s ideas regarding coordinates with my thoughts about how these geometries are ?first order entities? that don?t have a single representative x and y. The spatial coordinates can be said to reside in the system of geometries described in the ?sf? container variable? I realize this goes against the idea of coordinates a bit, but I think it is holding with the spirit of the attribute?
>
>
>
>
> Finally, I?m glad to continue answering questions and debating things via the list to a point, but I think it would be in our interest to arrange a telecom to discuss this stuff further with a list of interested parties. Feel free to follow up on list, but for decision making, let?s not let this rabbit hole go too deep. I?ll plan on letting this and the other recent action on this proposal settle with people for a week or two then start to bring together a conference call (or calls depending on time zones). Please respond to me off list if you are interested in being part of a call to discuss.
>
>
>
>
> Regards,
>
>
>
>
> - Dave
>
>
>
>
> netcdf multipolygon_example {
>
> dimensions:
>
> node = 47 ;
>
> part = 9 ;
>
> instance = 3 ;
>
> time = 5 ;
>
> strlen = 5 ;
>
> variables:
>
> char instance_name(instance, strlen) ;
>
> instance_name:cf_role = "timeseries_id" ;
>
> double someVariable(instance) ;
>
> someVariable:long_name = "a variable describing a single-valued attribute of a polygon" ;
>
> someVariable:coordinates = "sf" ; // or "instance_name"?
>
> int time(time) ;
>
> time:units = "days since 2000-01-01" ;
>
> double someData(instance, time) ;
>
> someData:coordinates = "time sf" ; // or "time instance_name"?
>
> someData:featureType = "timeSeries" ;
>
> someData:geometry="sf";
>
> int sf; // containing variable -- datatype irrelevant because no data
>
> sf:geom_type = "multipolygon" ; // could be node_topology_type?
>
> sf:node_count_variable="node_count";
>
> sf:node_coordinates = "x y" ;
>
> sf:part_count = "part_node_count" ;
>
> sf:part_type = "part_type" ; // Note required unless polygons with holes present.
>
> sf:outer_ring_order = "anticlockwise" ; // not required if written in spec?
>
> sf:closure_convention = "last_node_equals_first" ; // not required if written in spec?
>
> sf:outer_type_code = 0 ; // not required if written in spec?
>
> sf:inner_type_code = 1 ; // not required if written in spec?
>
> int node_count(instance);
>
> node_count:long_name = ?count of coordinates in each instance geometry" ;
>
> int part_node_count(part) ;
>
> part_node_count:long_name = ?count of coordinates in each geometry part" ;
>
> int part_type(part) ;
>
> part_type:long_name = ?type of each geometry part" ;
>
> double x(node) ;
>
> x:units = "degrees_east" ;
>
> x:standard_name = "longitude" ; // or projection_x_coordinate
>
> X:cf_role = "geometry_x_node" ;
>
> double y(node) ;
>
> y:units = "degrees_north" ;
>
> y:standard_name = ?latitude? ; // or projection_y_coordinate
>
> y:cf_role = "geometry_y_node"
>
> // global attributes:
>
> :Conventions = "CF-1.8" ;
>
>
> data:
>
>
> instance_name =
>
> "flash",
>
> "bang",
>
> "pow" ;
>
>
> someVariable = 1, 2, 3 ;
>
>
> time = 1, 2, 3, 4, 5 ;
>
>
> someData =
>
> 1, 2, 3, 4, 5,
>
> 1, 2, 3, 4, 5,
>
> 1, 2, 3, 4, 5 ;
>
>
> node_count = 25, 15, 7 ;
>
>
> part_node_count = 5, 4, 4, 4, 4, 8, 6, 8, 4 ;
>
>
> part_type = 0, 1, 1, 1, 0, 0, 0, 1, 0 ;
>
>
> x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
>
> 5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20, -30, -20, -20, -30, 30,
>
> 45, 10, 30, 25, 50, 30, 25 ;
>
>
> y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29,
>
> 25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35, -20, -15, -25, -20, 20,
>
> 40, 40, 20, 5, 10, 15, 5 ;
>
> }
>
>
>
>
>
>
>
> On Feb 4, 2017, at 8:07 AM, David Blodgett <dblodgett at usgs.gov <mailto:dblodgett at usgs.gov>> wrote:
>
>
>
>
> Dear Chris,
>
>
>
>
> Thanks for your thorough treatment of these issues. We have gone through a similar thought process to arrive at the proposal we came up with. I?ll answer as briefly as I can.
>
>
>
>
> 1) how would you translate between netcdf geometries and, say geo JSON?
>
>
>
>
> The thinking is that node coordinate sharing is optional. If the writer wants to check or already knows that nodes share coordinates, then it?s possible. Otherwise, it doesn?t have to be used. I?ve always felt that this was important, but maybe not critical for a core NetCDF-CF data model. Some offline conversation has led to an example that does not use it that may be a good alternative, more on that later.
>
>
>
>
> 2) Break Values
>
>
>
>
> You really do have to hold your nose on the break values. The issue is that you have to store that information somehow and it is almost worse to create new variables to store the multi-part and hole/not hole information. The alternative approach that?s forming up as mentioned above does break the information out into additional variables but simplifies things otherwise. In that case it doesn?t feel overly complex to me? so stay tuned for more on this front.
>
>
>
>
> 3) Ragged Indexing
>
>
>
>
> Your thought process follows ours exactly. The key is that you either have to create the ?pointer? array as a first order of business or loop over the counts ad nauseam. I?m actually leaning toward the counts for two reasons. First, the counts approach is already in CF so is a natural fit and will be familiar to developers in this space. Second, the issue of 0 vs 1 indexing is annoying. In our proposal, we settled on 0 indexing because it aligns with the idea of an offset, but it is still annoying and some applications would always have to adjust that pointer array as a first order of business.
>
>
>
>
> On to Bob?s comments.
>
>
>
>
> Regarding aligning with other data models / encodings, I guess this needs to be unpacked a bit.
>
>
>
>
> 1) In this setting, simple features is a data model, not an encoding. An encoding can implement part or all of a data model as is needed by the use case(s) at hand. There is no problem with partial implementations you still get interoperability for the intended use cases.
>
>
> 2) Attempting to align with other encoding standards UGRID and NetCDF-CF are the primary ones here, is simply to keep the implementation patterns similar and familiar. This may be a fools errand, but is presumably good for adoptability and consistency.
>
>
> So, I don?t see a problem with implementing important simple features types in a way that aligns with the way the existing community standards work.
>
>
>
>
> I don?t see this as ignoring existing standards at all. There is no open community standard for binary encoding of geometries and related data that passes the CF requirements of human readability and self-description. We are adopting the appropriate data model and suggesting a new encoding that will solve a lot of problems in the environmental modeling space.
>
>
>
>
> As we?ve discussed before, your "different approach? sounds great, but seems like an exercise for a future effort that doesn?t attempt to align with CF 1.7. Maybe what you suggest is a path forward for variable length arrays in the CF 2.0 ?vision in the mist?, but I don?t see it as a tenable solution for CF 1.*.
>
>
>
>
> Best Regards,
>
>
>
>
> - Dave
>
>
>
>
>
>
> On Feb 3, 2017, at 3:31 PM, Chris Barker <chris.barker at noaa.gov <mailto:chris.barker at noaa.gov>> wrote:
>
>
>
>
> a few thoughts. First, I think there are three core "issues" that need to be resolved:
>
>
>
>
> 1) Coordinate indexing (indirection)
>
>
> the question of whether you have an array of "vertices" that the geomotry types index into to get thier data:
>
>
>
>
> Advantages:
>
>
> - if a number of geometries share a lot of vertices, it can be more efficient
>
>
> - the relationship between geometries that share vertices (i.e. polygons that share a boundary) etc. is well defined. you dopnt need to check for closeness, and maybe have a tolerance, etc.
>
>
>
>
> These were absolutely critical for UGRID for example -- a UGRID mesh is a single thing", NOT a collection of polygons that happen to share some vertices.
>
>
>
>
> Disadvantages:
>
>
> - if the geometries do not share many vertices, it is less efficient.
>
>
> - there are additional code complications in "getting" the vertices of the given geometry
>
>
> - it does not match the OGC data model.
>
>
>
>
> My 0.02 -- given my use cases, I tend to want teh advantages -- but I don't know that that's a typical use case. And I think it's a really good idea to keep with the OGS data model where possible -- i.e. e able to translate from netcdf to, say, geoJSON as losslessly as possible. Given that I think it's probably a better idea not to have the indirection.
>
>
>
>
> However (to equivocate) perhaps the types of information people are likely to want to store in netcdf are a subset of what the OGC standards are designed for -- and for those use-cases, maybe shared vertices are critical.
>
>
>
>
> One way to think about it -- how would you translate between netcdf geometries and, say geo JSON:
>
>
> - nc => geojson would lose the shared index info.
>
>
> - geojson => nc -- would you try to reconstruct the shared vertices?? I"m thinking that would be a bit dangerous in the general case, because you are adding information that you don't know is true -- are these a shared vertex or two that just happen to be at the same location?
>
>
>
>
> > > Break values
>
>
>
>
> I don't really like break values as an approach, but with netcdf any option will be ugly one way or another. So keeping with the WKT approach makes sense to me. Either way you'll need custom code to unpack it. (BTW -- what does WellKnownBinary do?)
>
>
>
>
> > > Ragged indexing
>
>
>
>
> There are two "natural" ways to represent a ragged array:
>
>
>
>
> (a) store the length of each "row"
>
>
> (b) store the index to the beginning (or end) or each "row"
>
>
>
>
> CF already uses (a). However, working with it, I'm pretty convinced that it's the "wrong" choice:
>
>
>
>
> If you want to know how long a given row is, that is really easy with (a), and almost as easy with (b) (involves two indexes and a subtraction)
>
>
>
>
> However, if you want to extract a particular row: (b) makes this really easy -- you simply access the slice of the array you want. with (a) you need to loop through the entire "length_of_rows" array (up to the row of interest) and add up the values to find the slice you need. not a huge issue, but it is an issue. In fact, in my code to read ragged arrays in netcdf, the first thing I do is pre-compute the index-to-each-row, so I can then use that to access individual rows for future access -- if you are accessing via OpenDAP -- that's particular helpful.
>
>
>
>
> So -- (b) is clearly (to me) the "best" way to do it -- but is it worth introducing a second way to handle ragged arrays in CF? I would think yes, but that would be offset if:
>
>
>
>
> - There is a bunch of existing library code that transparently handles ragged arrays in netcdf (does netcdfJava have something? I'm pretty sure Python doesn't -- certainly not in netCDF4)
>
>
>
>
> - That that existing lib code would be advantageous to leverage for code reading features: I suspect that there will have to be enough custom code that the ragged array bits are going to be the least of it.
>
>
>
>
> So I'm for the "new" way of representing ragged arrays
>
>
>
>
> -CHB
>
>
>
>
>
>
> On Fri, Feb 3, 2017 at 11:41 AM, Bob Simons - NOAA Federal <bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>> wrote:
>
>
> Then, isn't this proposal just the first step in the creation of a new model and a new encoding of Simple Features, one that is "align[ed] ... with as many other encoding standards in this space as is practical"? In other words, yet another standard for Simple Features?
>
>
>
>
> If so, it seems risky to me to take just the first (easy?) step "to support the use cases that have a compelling need today" and not solve the entire problem. I know the CF way is to just solve real, current needs, but in this case it seems to risk a head slap moment in the future when we realize that, in order to deal with some new simple feature variant, we should have done things differently from the beginning?
>
>
>
>
> And it seems odd to reject existing standards that have been so painstakingly hammered out, in favor of starting the process all over again. We follow existing standards for other things (e.g., IEEE-754 for representing floating point numbers in binary files), why can't we follow an existing Simple Features standard?
>
>
>
>
> ---
>
>
> Rather than just be a naysayer, let me suggest a very different alternative:
>
>
>
>
> There are several projects in the CF realm (e.g., this Simple Features project, Discrete Sampling Geometry (DSG), true variable-length Strings, ugrid(?)) which share a common underlying problem: how to deal with variable-length multidimensional arrays: a[b][c], where the length of the c dimension may be different for different b indices.
>
>
> DSG solved this (5 different ways!), but only for DSG.
>
>
> The Simple Features proposal seeks to solve the problem for Simple Features.
>
>
> We still have no support for Unicode variable-length Strings.
>
>
>
>
> Instead of continuing to solve the variable-length problem a different way every time we confront it, shouldn't we solve it once, with one small addition to the standard, and then use that solution repeatedly?
>
>
> The solution could be a simple variant of one of the DSG solutions, but generalized so that it could be used in different situations.
>
>
> An encoding standard and built-in support for variable-length data arrays in netcdf-java/c would solve a lot of problems, now and in the future.
>
>
> Some work on this is already done: I think the netcdf-java API already supports variable-length arrays when reading netcdf-4 files.
>
>
> For Simple Features, the problem would reduce to: store the feature (using some specified existing standard like WKT or WKB) in a variable-length array.
>
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Feb 3, 2017 at 9:07 AM, <cf-metadata-request at cgd.ucar.edu <mailto:cf-metadata-request at cgd.ucar.edu>> wrote:
>
>
> Date: Fri, 3 Feb 2017 11:07:00 -0600
> From: David Blodgett <dblodgett at usgs.gov <mailto:dblodgett at usgs.gov>>
> To: Bob Simons - NOAA Federal <bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>>
> Cc: CF Metadata <cf-metadata at cgd.ucar.edu <mailto:cf-metadata at cgd.ucar.edu>>
> Subject: Re: [CF-metadata] Extension of Discrete Sampling Geometries
> for Simple Features
> Message-ID: <8EE85E65-2815-4720-90FC-13C72D3C7952 at usgs.gov <mailto:8EE85E65-2815-4720-90FC-13C72D3C7952 at usgs.gov>>
> Content-Type: text/plain; charset="utf-8"
>
> Dear Bob,
>
> I?ll just take these in line.
>
> 1) noted. We have been trying to figure out what to do with the point featureType and I think leaving it more or less alone is a viable path forward.
>
> 2) This is not an exact replica of WKT, but rather a similar approach to WKT. As I stated, we have followed the ISO simple features data model and well known text feature types in concept, but have not used the same standardization formalisms. We aren?t advocating for supporting ?all of? any standard but are rather attempting to support the use cases that have a compelling need today while aligning this with as many other encoding standards in this space as is practical. Hopefully that answers your question, sorry if it?s vague.
>
> 3) The google doc linked in my response contains the encoding we are proposing as a starting point for conversation: http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>> I want to stress, as a starting point for discussion. I expect that this proposal will change drastically before we?re done.
>
> 4) Absolutely envision tools doing what you say, convert to/from standard spatial formats and NetCDF-CF geometries. We intend to introduce an R and a Python implementation that does exactly as you say along with whatever form this standard takes in the end. R and Python were chosen as the team that brought this together are familiar with those two languages, additional implementations would be more than welcome.
>
> 5) We do include a ?geometry? featureType similar to the ?point? featureType. Thus our difficulty with what to do with the ?point? featureType. You are correct, there are lots of non timeSeries applications to be solved and this proposal does intend to support them (within the existing DSG constructs).
>
> Thanks for your questions, hopefully my answers close some gaps for you.
>
> - Dave
>
> > On Feb 3, 2017, at 10:47 AM, Bob Simons - NOAA Federal <bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>> wrote:
> >
> > 1) There is a vague comment in the proposal about possibly changing the point featureType. Please don't, unless the changes don't affect current uses of Point. There are already 1000's of files that use it. If this new system offers an alternative, then fine, it's an alternative. One of the most important and useful features of a good standard is backwards compatibility.
> >
> > 2) You advocate "Implement the WKT approach using a NetCDF binary array." Is this system then an exact encoding of WKT, neither a subset nor a superset? "Simple Features" are often not simple.
> > If it is WKT (or something else), what is the standard you are following to describe the Simple Features (e.g., ISO/IEC 13249-3:2016 and ISO 19162:2015)?
> > Does your proposal deviate in any way from the standard's capabilities?
> > Do you advocate following the entire WKT standard, e.g., supporting all the feature types that WKT supports?
> >
> > 3) Since you are not using the WKT encoding, but creating your own, where is the definition of the encoding system you are using?
> >
> > 4) This is a little out of CF scope, but:
> > Do you envision tools, notably, netcdf-c/java, having a writer function that takes in WKT and encodes the information in a file, and having a reader function that reads the file and returns WKT? Or is it your plan that the encoding/ decoding is left to the user?
> >
> > 5) This proposal is for "Simple Features plus Time Series" (my phrase not yours). But aren't there lots of other uses of Simple Features? Will there be other proposals in the future for "Simple Features plus X" and "Simple Features plus Y"? If so, will CF eventually become a massive document where Simple Features are defined over and over again, but in different contexts? If so, wouldn't a better solution be to deal with Simple Features separately (as Postgres does by making a geometric data type?), and then add "Simple Features plus Time Series" as the first use of it?
> >
> > Thanks for answering these questions.
> > Please forgive me if I missed parts of your proposal that answer these questions.
> >
> >
> > On Thu, Feb 2, 2017 at 5:57 AM, <cf-metadata-request at cgd.ucar.edu <mailto:cf-metadata-request at cgd.ucar.edu> <mailto:cf-metadata-request at cgd.ucar.edu <mailto:cf-metadata-request at cgd.ucar.edu>>> wrote:
> > Date: Thu, 2 Feb 2017 07:57:36 -0600
> > From: David Blodgett <dblodgett at usgs.gov <mailto:dblodgett at usgs.gov> <mailto:dblodgett at usgs.gov <mailto:dblodgett at usgs.gov>>>
> > To: <cf-metadata at cgd.ucar.edu <mailto:cf-metadata at cgd.ucar.edu> <mailto:cf-metadata at cgd.ucar.edu <mailto:cf-metadata at cgd.ucar.edu>>>
> > Subject: [CF-metadata] Extension of Discrete Sampling Geometries for
> > Simple Features
> > Message-ID: <224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov <mailto:224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov> <mailto:224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov <mailto:224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov>>>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Dear CF Community,
> >
> > We are pleased to submit this proposal for your consideration and review. The cover letter we've prepared below provides some background and explanation for the proposed approach. The google doc here <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>>> is an excerpt of the CF specification with track changes turned on. Permissions for the document allow any google user to comment, so feel free to comment and ask questions in line.
> >
> > Note that I?m sharing this with you with one issue unresolved. What to do with the point featureType? Our draft suggests that it is part of a new geometry featureType, but it could be that we leave it alone and introduce a geometry featureType. This may be a minor point of discussion, but we need to be clear that this is an issue that still needs to be resolved in the proposal.
> >
> > Thank you for your time and consideration.
> >
> > Best Regards,
> >
> > David Blodgett, Tim Whiteaker, and Ben Koziol
> >
> > Proposed Extension to NetCDF-CF for Simple Geometries
> >
> > Preface
> >
> > The proposed addition to NetCDF-CF introduced below is inspired by a pre-existing data model governed by OGC and ISO as ISO 19125-1. More information on Simple Features may be found here. <https://en.wikipedia.org/wiki/Simple_Features <https://en.wikipedia.org/wiki/Simple_Features> <https://en.wikipedia.org/wiki/Simple_Features <https://en.wikipedia.org/wiki/Simple_Features>>> To the knowledge of the authors, it is consistent with ISO 19125-1 but has not been specified using the formalisms of OGC or ISO. Language used attempts to hold true to NetCDF-CF semantics while not conflicting with the existing standards baseline. While this proposal does not support the entire scope of the the simple features ecosystem, it does support the core data types in most common use around the community.
> >
> > The other existing standard to mention is UGRID convention <http://ugrid-conventions.github.io/ugrid-conventions/ <http://ugrid-conventions.github.io/ugrid-conventions/> <http://ugrid-conventions.github.io/ugrid-conventions/ <http://ugrid-conventions.github.io/ugrid-conventions/>>>. The authors have experience reading and writing UGRID and have designed the proposed structure in a way that is inspired by and consistent with it.
> >
> > Terms and Definitions
> >
> > (Taken from OGC 06-103r4 OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture <http://www.opengeospatial.org/standards/sfa <http://www.opengeospatial.org/standards/sfa><http://www.opengeospatial.org/standards/sfa <http://www.opengeospatial.org/standards/sfa>>>.)
> >
> > Feature: Abstraction of real world phenomena - typically a geospatial abstraction with associated descriptive attributes.
> > Simple Feature: A feature with all geometric attributes described piecewise by straight line or planar interpolation between point sets.
> > Geometry (geometric complex): A set of disjoint geometric primitives - one or more points, lines, or polygons that form the spatial representation of a feature.
> > Introduction
> >
> > Discrete Sampling Geometries (DSGs) handle data from one (or a collection of) timeSeries (point), Trajectory, Profile, TrajectoryProfile or timeSeriesProfile geometries. Measurements are from a point (timeSeries and Profile) or points along a trajectory. In this proposal, we reuse the core DSG timeSeries type which provides support for basic time series use cases e.g., a timeSerieswhich is measured (or modeled) at a given point.
> >
> > Changes to Existing CF Specification
> >
> > In NetCDF-CF 1.7, Discrete Sampling Geometries separate dimensions and variables into two types ? instance and element <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements> <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements>>>. Instance refers to individual points, trajectories, profiles, etc. These would sometimes be referred to as features given that they are identified entities that can have associated attributes and be related to other entities. Element dimensions describe temporal or other dimensions to describe data on a per-instance basis. This proposal extends the DSG timeSeries featuretype <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types <http://cfconventions.org/cf-conventions
/cf-conventions.html#_features_and_feature_types> <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types>>> such that the geospatial coordinates of the instances can be point, multi-point, line, multi-line, polygon, or multi-polyg
> on geometries. Rather than overload the DSG contiguous ragged array encoding, designed with timeseries in mind, a geometry ragged array encoding is introduced in a new section 9.3.5. See thi
> > s google doc for specific proposed changes. <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>>>
> > Motivation
> >
> > DSGs have no system to define a geometry (polyline, polygon, etc., other than point) and an association with a time series that applies over that entire geometry e.g., The expected rainfall in this watershed polygon for some period of time is 10 mm. As suggested in the last paragraph of section 9.1, current practice is to assign a representative point or just use an ID and forgo spatial information within a NetCDF-CF file. In order to satisfy a number of environmental modeling use cases, we need a way to encode a geometry (point, line, polygon, multi-point, multi-line, or multi-polygon) that is the static spatial feature representation to which one or more timeSeries can be associated. In this proposal, we provide an encoding to define collections of simple feature geometries. It interfaces cleanly with the existing DSG specification, enabling DSGs and Simple Geometries to be used concurrently.
> >
> > Looking Forward
> >
> > This proposal is a compromise solution that attempts to stay consisten to CF ideals and fit within the structure of the existing specification with minimal disruption. Line and polygon data types often require variable length arrays. Development of this proposal has brought to light the need for a general abstraction for variable length arrays in NetCDF-CF. Such a general abstraction would necessarily be reusable for character arrays, ragged arrays of time series, and ragged arrays of geometry nodes, as well as any other ragged data structures that may come up in the future. This proposal does not introduce such a general ragged array abstraction but does not preclude such a development in the future.
> >
> > Three Alternative Approaches
> >
> > Respecting the human readability ideal of NetCDF-CF, the development of this proposal started from a human readable format for geometries known as Well Known Text <https://en.wikipedia.org/wiki/Well-known_text <https://en.wikipedia.org/wiki/Well-known_text> <https://en.wikipedia.org/wiki/Well-known_text <https://en.wikipedia.org/wiki/Well-known_text>>>. We considered three high level design approaches while developing this proposal.
> >
> > Direct use of Well-Known Text (WKT). In this approach, well known text strings would be encoded using character arrays following a contiguous ragged array approach to index the character array by geometry (or instance in DSG parlance).
> > Implement the WKT approach using a NetCDF binary array. In this approach, well known text separators (brackets, commas and spaces) for multipoint, multiline, multipolygon, and polygon holes, would be encoded as break type separator values like -1 for multiparts and -2 for holes.
> > Implement the fundamental dimensions of geometry data in NetCDF. In this approach, additional dimensions and variables along those dimensions would be introduced to represent geometries, geometry parts, geometry nodes, and unique (potentially shared) coordinate locations for nodes to reference.
> > Selected Approach
> >
> > The first approach was seen as too opaque to stay true to the CF ideal of complete self-description. The third approach seemed needlessly verbose and difficult to implement. The second approach was selected for the following reasons:
> >
> > The second approach is just as or more human-readable than the third.
> > Use of break values keeps geometries relatively atomic.
> > Will be familiar to developers who are familiar with the WKT geometry format.
> > Character arrays, which are needed for options one and three, are cumbersome to use in some programming languages in common use with NetCDF.
> > Break values replace the need for extraneous variables related to multi-part and polygon holes (interiors). Multi-part geometries are generally an exception and excessive instrumentation to support them should be discounted.
> > Example: Representation of WKT-Style Polygons in a NetCDF-3 timeSeriesfeatureType
> >
> > Below is sample CDL demonstrating how polygons are encoded in NetCDF-3 using a continuous ragged array-like encoding. There are three details to note in the example below.
> >
> > The attribute contiguous_ragged_dimension with value of a dimension in the file.
> > The geom_coordinates attribute with a value containing a space separated string of variable names.
> > The cf_role geometry_x_node and geometry_y_node.
> > These three attributes form a system to fully describe collections of multi-polygon feature geometries. Any variable that has the continuous_ragged_dimension attribute contains integers that indicate the 0-indexed starting position of each geometry along the instance dimension. Any variable that uses the dimension referenced in the continuous_ragged_dimension attribute can be interpreted using the values in the variable containing the contiguous_ragged_dimension attribute. The variables referenced in the geom_coordinates attribute describe spatial coordinates of geometries. These variables can also be identified by the cf_roles geometry_x_node and geometry_y_node. Note that the example below also includes a mechanism to handle multi-polygon features that also contain holes.
> >
> > netcdf multipolygon_example {
> > dimensions:
> > node = 47 ;
> > indices = 55 ;
> > instance = 3 ;
> > time = 5 ;
> > strlen = 5 ;
> > variables:
> > char instance_name(instance, strlen) ;
> > instance_name:cf_role = "timeseries_id" ;
> > int coordinate_index(indices) ;
> > coordinate_index:geom_type = "multipolygon" ;
> > coordinate_index:geom_coordinates = "x y" ;
> > coordinate_index:multipart_break_value = -1 ;
> > coordinate_index:hole_break_value = -2 ;
> > coordinate_index:outer_ring_order = "anticlockwise" ;
> > coordinate_index:closure_convention = "last_node_equals_first" ;
> > int coordinate_index_start(instance) ;
> > coordinate_index_start:long_name = "index of first coordinate in each instance geometry" ;
> > coordinate_index_start:contiguous_ragged_dimension = "indices" ;
> > double x(node) ;
> > x:units = "degrees_east" ;
> > x:standard_name = "longitude" ; // or projection_x_coordinate
> > X:cf_role = "geometry_x_node" ;
> > double y(node) ;
> > y:units = "degrees_north" ;
> > y:standard_name = ?latitude? ; // or projection_y_coordinate
> > y:cf_role = "geometry_y_node"
> > double someVariable(instance) ;
> > someVariable:long_name = "a variable describing a single-valued attribute of a polygon" ;
> > int time(time) ;
> > time:units = "days since 2000-01-01" ;
> > double someData(instance, time) ;
> > someData:coordinates = "time x y" ;
> > someData:featureType = "timeSeries" ;
> > // global attributes:
> > :Conventions = "CF-1.8" ;
> >
> > data:
> >
> > instance_name =
> > "flash",
> > "bang",
> > "pow" ;
> >
> > coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12, -2, 13, 14, 15, 16,
> > -1, 17, 18, 19, 20, -1, 21, 22, 23, 24, 25, 26, 27, 28, -1, 29, 30, 31, 32, 33,
> > 34, -2, 35, 36, 37, 38, 39, 40, 41, 42, -1, 43, 44, 45, 46 ;
> >
> > coordinate_index_start = 0, 30, 46 ;
> >
> > x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
> > 5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20, -30, -20, -20, -30, 30,
> > 45, 10, 30, 25, 50, 30, 25 ;
> >
> > y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29,
> > 25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35, -20, -15, -25, -20, 20,
> > 40, 40, 20, 5, 10, 15, 5 ;
> >
> > someVariable = 1, 2, 3 ;
> >
> > time = 1, 2, 3, 4, 5 ;
> >
> > someData =
> > 1, 2, 3, 4, 5,
> > 1, 2, 3, 4, 5,
> > 1, 2, 3, 4, 5 ;
> > }
> > How To Interpret
> >
> > Starting from the timeSeries variables:
> >
> > See CF-1.8 conventions.
> > See the timeSeries featureType.
> > Find the timeseries_id cf_role.
> > Find the coordinates attribute of data variables.
> > See that the variables indicated by the coordinates attribute have a cf_role geometry_x_nodeand geometry_y_node to determine that these are geometries according to this new specification.
> > Find the coordinate index variable with geom_coordinates that point to the nodes.
> > Find the variable with contiguous_ragged_dimension pointing to the dimension of the coordinate index variable to determine how to index into the coordinate index.
> > Iterate over polygons, parsing out geometries using the contiguous ragged start variable and coordinate index variable to interpret the coordinate data variables.
> > Or, without reference to timeSeries:
> >
> > See CF-1.8 conventions.
> > See the geom_type of multipolygon.
> > Find the variable with a contiguous_ragged_dimension matching the coordinate index variable?s dimension.
> > See the geom_coordinates of x y.
> > Using the contiguous ragged start variable found in 3 and the coordinate index variable found in 2, geometries can be parsed out of the coordinate index variable and parsed using the hole and break values in it.
> >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html> <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html>>>
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > CF-metadata mailing list
> > CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu> <mailto:CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>>
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata> <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>>
> >
> >
> > ------------------------------
> >
> > End of CF-metadata Digest, Vol 166, Issue 3
> > *******************************************
> >
> >
> >
> > --
> > Sincerely,
> >
> > Bob Simons
> > IT Specialist
> > Environmental Research Division
> > NOAA Southwest Fisheries Science Center
> > 99 Pacific St., Suite 255A (New!)
> > Monterey, CA 93940 (New!)
> > Phone: (831)333-9878 <tel:%28831%29333-9878> (New!)
> > Fax: (831)648-8440 <tel:%28831%29648-8440>
> > Email: bob.simons at noaa.gov <mailto:bob.simons at noaa.gov> <mailto:bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>>
> >
> > The contents of this message are mine personally and
> > do not necessarily reflect any position of the
> > Government or the National Oceanic and Atmospheric Administration.
> > <>< <>< <>< <>< <>< <>< <>< <>< <><
> >
> > _______________________________________________
> > CF-metadata mailing list
> > CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170203/4ff55def/attachment.html <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170203/4ff55def/attachment.html>>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>
>
> ------------------------------
>
> End of CF-metadata Digest, Vol 166, Issue 5
> *******************************************
>
>
>
>
>
>
>
> --
>
>
> Sincerely,
>
> Bob Simons
> IT Specialist
> Environmental Research Division
> NOAA Southwest Fisheries Science Center
> 99 Pacific St., Suite 255A (New!)
> Monterey, CA 93940 (New!)
> Phone: (831)333-9878 <tel:(831)%20333-9878> (New!)
>
>
> Fax: (831)648-8440 <tel:(831)%20648-8440>
> Email: bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>
>
> The contents of this message are mine personally and
> do not necessarily reflect any position of the
> Government or the National Oceanic and Atmospheric Administration.
> <>< <>< <>< <>< <>< <>< <>< <>< <><
>
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>
>
>
>
>
>
> --
>
>
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R (206) 526-6959 voice
> 7600 Sand Point Way NE (206) 526-6329 fax
> Seattle, WA 98115 (206) 526-6317 main reception
>
> Chris.Barker at noaa.gov <mailto:Chris.Barker at noaa.gov>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>
>
>
>
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170222/f0172e9f/attachment.html>
Received on Wed Feb 22 2017 - 08:23:53 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒