⇐ ⇒

[CF-metadata] Extension of Discrete Sampling Geometries for Simple Features

From: Blodgett, David <dblodgett>
Date: Sat, 18 Feb 2017 16:17:44 -0600

 Let's try this again. Sorry to spam everyone.

http://doodle.com/poll/yaherucx2w3cd9y6

On February 17, 2017 8:20:11 PM CST, "Whiteaker, Timothy L" <
whiteaker at utexas.edu> wrote:
>
> Hi Dave,
>
> I get "This poll does not exist anymore" for both polls you sent.
>
>
>
> Tim Whiteaker
>
> Research Scientist
>
> The University of Texas at Austin
>
>
>
> *From:* CF-metadata [mailto:cf-metadata-bounces at cgd.ucar.edu] *On Behalf
> Of *David Blodgett
> *Sent:* Friday, February 17, 2017 2:08 PM
> *To:* CF Metadata
> *Subject:* Re: [CF-metadata] Extension of Discrete Sampling Geometries
> for Simple Features
>
>
>
> My apologies, I forgot to turn on time zone support in the poll below.
> Please use this one instead. http://doodle.com/poll/eikarnt35tdm7igd
>
>
>
> On Feb 17, 2017, at 1:22 PM, David Blodgett <dblodgett at usgs.gov> wrote:
>
>
>
> All,
>
>
>
> I haven?t heard much follow up, but here?s a doodle to coordinate a phone
> conversation about this. I think we have west-coast US participants and EU
> participants, so I chose times mid to late morning for me (midwest US).
>
>
>
> http://doodle.com/poll/eikarnt35tdm7igd
>
>
>
> Will make a call once a few people have expressed interest and we have a
> clear day/time.
>
>
>
> Regards,
>
>
>
> - Dave
>
>
>
> On Feb 6, 2017, at 11:29 AM, David Blodgett <dblodgett at usgs.gov> wrote:
>
>
>
> Dear CF,
>
>
>
> I want to follow up on the conversation here with an alternative approach
> suggested off list primarily between Jonathan and I. For this, I?m going to
> focus on use cases satisfied and simplification of the proposal allowed by
> not supporting those use cases. The changes below are largely driven by a
> desire to better align this proposal with the technical details of the
> prior art that is CF.
>
>
>
> If we:
>
> 1) don?t support node sharing, we can remove the complication of node -
> coordinate indexing / indirection, simplifying the proposal pretty
> significantly.
>
> 2) don?t use ?break values? to indicate the separation between multi-part
> geometries and polygon holes, we end up with a data model with an extra
> dimension, but the NetCDF dimensions align with the natural dimensions of
> the data.
>
> 3) use ?count? instead of a ?start pointer? approach, we are better
> aligned with the existing DSG contiguous ragged array approach.
>
>
>
> Coming back to the three directions we could take this proposal from my
> cover letter on February 2nd.
>
> 1. Direct use of Well-Known Text (WKT). In this approach, well known
> text strings would be encoded using character arrays following a contiguous
> ragged array approach to index the character array by geometry (or instance
> in DSG parlance).
>
> 2. Implement the WKT approach using a NetCDF binary array. In this
> approach, well known text separators (brackets, commas and spaces) for
> multipoint, multiline, multipolygon, and polygon holes, would be encoded as
> break type separator values like -1 for multiparts and -2 for holes.
>
> 3. Implement the fundamental dimensions of geometry data in NetCDF.
> In this approach, additional dimensions and variables along those
> dimensions would be introduced to represent geometries, geometry parts,
> geometry nodes, and unique (potentially shared) coordinate locations for
> nodes to reference.
>
> The alternative I?m outlining here moves in the direction of 3. We had
> originally discounted it because it becomes very verbose and seems overly
> complicated if support for coordinate sharing is a requirement. If the
> three simplifications described above are used, then the third approach
> seems more tenable.
>
>
>
> Jonathan has also suggested that: (these are in reaction to the CDL in my
> letter from February 2nd)
>
> 1) Rename geom_coordinates as node_coordinates, for consistency with UGRID.
>
> 2) Omit node_dimension. This is redundant, since the dimension can be
> found by
>
> examining the node coordinate variables.
>
> 3) Prescribe numerous ?codes? and assumptions in the specification instead
> of letting them be described with attribute values.
>
> 4) It would be more consistent with CF and UGRID to use a single container
> variable to hang all the topology/geometry information from.
>
>
>
> Which I, personally, am happy to accept if others don?t object.
>
>
>
> A couple other suggestions from Jonathan I want to discuss a bit more:
>
> 1) Rename geometry as topology and geom_type as topology_type.
>
> While I?d be open to something other than geom, topology is
> odd. If this is really ?node_collection_topology_type? I guess I could be
> convinced, but would be curious how people react to this. (Especially in
> relation to UGRID)
>
> 2) This extension is more appropriate as an extension to the concept of
> cell bounds than the addition of a complex time-invariate type of discrete
> sampling geometry.
>
> Having just re-read the cell bounds chapter, I think it would
> over complicate the cell bounds to include this material. My basic issue
> here is that these geometries do not necessarily have a reference location.
> They are, rather, first order entities that need to be treated as such.
> That said, it makes sense that these geometries are not necessarily a good
> fit for the original intent of Discrete Sampling Geometries. Jonathan
> suggested they may belong in their own chapter, which may be a good
> alternative? MY suggested CDL below might lead us in the direction of this
> being a special type of auxiliary coordinate variable.
>
>
>
> This alternative starts to look like the CDL pasted below.
>
>
>
> Note that the issue of coordinates is sticking out like a sore thumb.
> Below, I?ve attempted to reconcile Jonathan?s ideas regarding coordinates
> with my thoughts about how these geometries are ?first order entities? that
> don?t have a single representative x and y. The spatial coordinates can be
> said to reside in the system of geometries described in the ?sf? container
> variable? I realize this goes against the idea of coordinates a bit, but I
> think it is holding with the spirit of the attribute?
>
>
>
> Finally, I?m glad to continue answering questions and debating things via
> the list to a point, but I think it would be in our interest to arrange a
> telecom to discuss this stuff further with a list of interested parties.
> Feel free to follow up on list, but for decision making, let?s not let this
> rabbit hole go too deep. I?ll plan on letting this and the other recent
> action on this proposal settle with people for a week or two then start to
> bring together a conference call (or calls depending on time zones). Please
> respond to me off list if you are interested in being part of a call to
> discuss.
>
>
>
> Regards,
>
>
>
> - Dave
>
>
>
> netcdf multipolygon_example {
>
> dimensions:
>
> node = 47 ;
>
> part = 9 ;
>
> instance = 3 ;
>
> time = 5 ;
>
> strlen = 5 ;
>
> variables:
>
> char instance_name(instance, strlen) ;
>
> instance_name:cf_role = "timeseries_id" ;
>
> double someVariable(instance) ;
>
> someVariable:long_name = "a variable describing a single-valued attribute of a polygon" ;
>
> someVariable:coordinates = "sf" ; // or "instance_name"?
>
> int time(time) ;
>
> time:units = "days since 2000-01-01" ;
>
> double someData(instance, time) ;
>
> someData:coordinates = "time sf" ; // or "time instance_name"?
>
> someData:featureType = "timeSeries" ;
>
> someData:geometry="sf";
>
> int sf; // containing variable -- datatype irrelevant because no data
>
> sf:geom_type = "multipolygon" ; // could be node_topology_type?
>
> sf:node_count_variable="node_count";
>
> sf:node_coordinates = "x y" ;
>
> sf:part_count = "part_node_count" ;
>
> sf:part_type = "part_type" ; // Note required unless polygons with holes present.
>
> sf:outer_ring_order = "anticlockwise" ; // not required if written in spec?
>
> sf:closure_convention = "last_node_equals_first" ; // not required if written in spec?
>
> sf:outer_type_code = 0 ; // not required if written in spec?
>
> sf:inner_type_code = 1 ; // not required if written in spec?
>
> int node_count(instance);
>
> node_count:long_name = ?count of coordinates in each instance geometry" ;
>
> int part_node_count(part) ;
>
> part_node_count:long_name = ?count of coordinates in each geometry part" ;
>
> int part_type(part) ;
>
> part_type:long_name = ?type of each geometry part" ;
>
> double x(node) ;
>
> x:units = "degrees_east" ;
>
> x:standard_name = "longitude" ; // or projection_x_coordinate
>
> X:cf_role = "geometry_x_node" ;
>
> double y(node) ;
>
> y:units = "degrees_north" ;
>
> y:standard_name = ?latitude? ; // or projection_y_coordinate
>
> y:cf_role = "geometry_y_node"
>
> // global attributes:
>
> :Conventions = "CF-1.8" ;
>
>
>
> data:
>
>
>
> instance_name =
>
> "flash",
>
> "bang",
>
> "pow" ;
>
>
>
> someVariable = 1, 2, 3 ;
>
>
>
> time = 1, 2, 3, 4, 5 ;
>
>
>
> someData =
>
> 1, 2, 3, 4, 5,
>
> 1, 2, 3, 4, 5,
>
> 1, 2, 3, 4, 5 ;
>
>
>
> node_count = 25, 15, 7 ;
>
>
>
> part_node_count = 5, 4, 4, 4, 4, 8, 6, 8, 4 ;
>
>
>
> part_type = 0, 1, 1, 1, 0, 0, 0, 1, 0 ;
>
>
>
> x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
>
> 5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20, -30, -20, -20, -30, 30,
>
> 45, 10, 30, 25, 50, 30, 25 ;
>
>
>
> y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29,
>
> 25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35, -20, -15, -25, -20, 20,
>
> 40, 40, 20, 5, 10, 15, 5 ;
>
> }
>
>
>
>
>
>
>
> On Feb 4, 2017, at 8:07 AM, David Blodgett <dblodgett at usgs.gov> wrote:
>
>
>
> Dear Chris,
>
>
>
> Thanks for your thorough treatment of these issues. We have gone through a
> similar thought process to arrive at the proposal we came up with. I?ll
> answer as briefly as I can.
>
>
>
> 1) how would you translate between netcdf geometries and, say geo JSON?
>
>
>
> The thinking is that node coordinate sharing is optional. If the writer
> wants to check or already knows that nodes share coordinates, then it?s
> possible. Otherwise, it doesn?t have to be used. I?ve always felt that this
> was important, but maybe not critical for a core NetCDF-CF data model. Some
> offline conversation has led to an example that does not use it that may be
> a good alternative, more on that later.
>
>
>
> 2) Break Values
>
>
>
> You really do have to hold your nose on the break values. The issue is
> that you have to store that information somehow and it is almost worse to
> create new variables to store the multi-part and hole/not hole information.
> The alternative approach that?s forming up as mentioned above does break
> the information out into additional variables but simplifies things
> otherwise. In that case it doesn?t feel overly complex to me? so stay tuned
> for more on this front.
>
>
>
> 3) Ragged Indexing
>
>
>
> Your thought process follows ours exactly. The key is that you either have
> to create the ?pointer? array as a first order of business or loop over the
> counts ad nauseam. I?m actually leaning toward the counts for two reasons.
> First, the counts approach is already in CF so is a natural fit and will be
> familiar to developers in this space. Second, the issue of 0 vs 1 indexing
> is annoying. In our proposal, we settled on 0 indexing because it aligns
> with the idea of an offset, but it is still annoying and some applications
> would always have to adjust that pointer array as a first order of
> business.
>
>
>
> On to Bob?s comments.
>
>
>
> Regarding aligning with other data models / encodings, I guess this needs
> to be unpacked a bit.
>
>
>
> 1) In this setting, simple features is a data model, not an encoding. An
> encoding can implement part or all of a data model as is needed by the use
> case(s) at hand. There is no problem with partial implementations you still
> get interoperability for the intended use cases.
>
> 2) Attempting to align with other encoding standards UGRID and NetCDF-CF
> are the primary ones here, is simply to keep the implementation patterns
> similar and familiar. This may be a fools errand, but is presumably good
> for adoptability and consistency.
>
> So, I don?t see a problem with implementing important simple features
> types in a way that aligns with the way the existing community standards
> work.
>
>
>
> I don?t see this as ignoring existing standards at all. There is no open
> community standard for binary encoding of geometries and related data that
> passes the CF requirements of human readability and self-description. We
> are adopting the appropriate data model and suggesting a new encoding that
> will solve a lot of problems in the environmental modeling space.
>
>
>
> As we?ve discussed before, your "different approach? sounds great, but
> seems like an exercise for a future effort that doesn?t attempt to align
> with CF 1.7. Maybe what you suggest is a path forward for variable length
> arrays in the CF 2.0 ?vision in the mist?, but I don?t see it as a tenable
> solution for CF 1.*.
>
>
>
> Best Regards,
>
>
>
> - Dave
>
>
>
>
>
> On Feb 3, 2017, at 3:31 PM, Chris Barker <chris.barker at noaa.gov> wrote:
>
>
>
> a few thoughts. First, I think there are three core "issues" that need to
> be resolved:
>
>
>
> 1) Coordinate indexing (indirection)
>
> the question of whether you have an array of "vertices" that the geomotry
> types index into to get thier data:
>
>
>
> Advantages:
>
> - if a number of geometries share a lot of vertices, it can be more
> efficient
>
> - the relationship between geometries that share vertices (i.e. polygons
> that share a boundary) etc. is well defined. you dopnt need to check for
> closeness, and maybe have a tolerance, etc.
>
>
>
> These were absolutely critical for UGRID for example -- a UGRID mesh is a
> single thing", NOT a collection of polygons that happen to share some
> vertices.
>
>
>
> Disadvantages:
>
> - if the geometries do not share many vertices, it is less efficient.
>
> - there are additional code complications in "getting" the vertices of
> the given geometry
>
> - it does not match the OGC data model.
>
>
>
> My 0.02 -- given my use cases, I tend to want teh advantages -- but I
> don't know that that's a typical use case. And I think it's a really good
> idea to keep with the OGS data model where possible -- i.e. e able to
> translate from netcdf to, say, geoJSON as losslessly as possible. Given
> that I think it's probably a better idea not to have the indirection.
>
>
>
> However (to equivocate) perhaps the types of information people are likely
> to want to store in netcdf are a subset of what the OGC standards are
> designed for -- and for those use-cases, maybe shared vertices are critical.
>
>
>
> One way to think about it -- how would you translate between netcdf
> geometries and, say geo JSON:
>
> - nc => geojson would lose the shared index info.
>
> - geojson => nc -- would you try to reconstruct the shared vertices??
> I"m thinking that would be a bit dangerous in the general case, because you
> are adding information that you don't know is true -- are these a shared
> vertex or two that just happen to be at the same location?
>
>
>
> > > Break values
>
>
>
> I don't really like break values as an approach, but with netcdf any
> option will be ugly one way or another. So keeping with the WKT approach
> makes sense to me. Either way you'll need custom code to unpack it. (BTW --
> what does WellKnownBinary do?)
>
>
>
> > > Ragged indexing
>
>
>
> There are two "natural" ways to represent a ragged array:
>
>
>
> (a) store the length of each "row"
>
> (b) store the index to the beginning (or end) or each "row"
>
>
>
> CF already uses (a). However, working with it, I'm pretty convinced that
> it's the "wrong" choice:
>
>
>
> If you want to know how long a given row is, that is really easy with (a),
> and almost as easy with (b) (involves two indexes and a subtraction)
>
>
>
> However, if you want to extract a particular row: (b) makes this really
> easy -- you simply access the slice of the array you want. with (a) you
> need to loop through the entire "length_of_rows" array (up to the row of
> interest) and add up the values to find the slice you need. not a huge
> issue, but it is an issue. In fact, in my code to read ragged arrays in
> netcdf, the first thing I do is pre-compute the index-to-each-row, so I can
> then use that to access individual rows for future access -- if you are
> accessing via OpenDAP -- that's particular helpful.
>
>
>
> So -- (b) is clearly (to me) the "best" way to do it -- but is it worth
> introducing a second way to handle ragged arrays in CF? I would think yes,
> but that would be offset if:
>
>
>
> - There is a bunch of existing library code that transparently handles
> ragged arrays in netcdf (does netcdfJava have something? I'm pretty sure
> Python doesn't -- certainly not in netCDF4)
>
>
>
> - That that existing lib code would be advantageous to leverage for code
> reading features: I suspect that there will have to be enough custom code
> that the ragged array bits are going to be the least of it.
>
>
>
> So I'm for the "new" way of representing ragged arrays
>
>
>
> -CHB
>
>
>
>
>
> On Fri, Feb 3, 2017 at 11:41 AM, Bob Simons - NOAA Federal <
> bob.simons at noaa.gov> wrote:
>
> Then, isn't this proposal just the first step in the creation of a new
> model and a new encoding of Simple Features, one that is "align[ed] ...
> with as many other encoding standards in this space as is practical"? In
> other words, yet another standard for Simple Features?
>
>
>
> If so, it seems risky to me to take just the first (easy?) step "to
> support the use cases that have a compelling need today" and not solve the
> entire problem. I know the CF way is to just solve real, current needs, but
> in this case it seems to risk a head slap moment in the future when we
> realize that, in order to deal with some new simple feature variant, we
> should have done things differently from the beginning?
>
>
>
> And it seems odd to reject existing standards that have been so
> painstakingly hammered out, in favor of starting the process all over
> again. We follow existing standards for other things (e.g., IEEE-754 for
> representing floating point numbers in binary files), why can't we follow
> an existing Simple Features standard?
>
>
>
> ---
>
> Rather than just be a naysayer, let me suggest a very different
> alternative:
>
>
>
> There are several projects in the CF realm (e.g., this Simple Features
> project, Discrete Sampling Geometry (DSG), true variable-length Strings,
> ugrid(?)) which share a common underlying problem: how to deal with
> variable-length multidimensional arrays: a[b][c], where the length of the c
> dimension may be different for different b indices.
>
> DSG solved this (5 different ways!), but only for DSG.
>
> The Simple Features proposal seeks to solve the problem for Simple
> Features.
>
> We still have no support for Unicode variable-length Strings.
>
>
>
> Instead of continuing to solve the variable-length problem a different way
> every time we confront it, shouldn't we solve it once, with one small
> addition to the standard, and then use that solution repeatedly?
>
> The solution could be a simple variant of one of the DSG solutions, but
> generalized so that it could be used in different situations.
>
> An encoding standard and built-in support for variable-length data arrays
> in netcdf-java/c would solve a lot of problems, now and in the future.
>
> Some work on this is already done: I think the netcdf-java API already
> supports variable-length arrays when reading netcdf-4 files.
>
> For Simple Features, the problem would reduce to: store the feature (using
> some specified existing standard like WKT or WKB) in a variable-length
> array.
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Feb 3, 2017 at 9:07 AM, <cf-metadata-request at cgd.ucar.edu> wrote:
>
> Date: Fri, 3 Feb 2017 11:07:00 -0600
> From: David Blodgett <dblodgett at usgs.gov>
> To: Bob Simons - NOAA Federal <bob.simons at noaa.gov>
> Cc: CF Metadata <cf-metadata at cgd.ucar.edu>
> Subject: Re: [CF-metadata] Extension of Discrete Sampling Geometries
> for Simple Features
> Message-ID: <8EE85E65-2815-4720-90FC-13C72D3C7952 at usgs.gov>
> Content-Type: text/plain; charset="utf-8"
>
> Dear Bob,
>
> I?ll just take these in line.
>
> 1) noted. We have been trying to figure out what to do with the point
> featureType and I think leaving it more or less alone is a viable path
> forward.
>
> 2) This is not an exact replica of WKT, but rather a similar approach to
> WKT. As I stated, we have followed the ISO simple features data model and
> well known text feature types in concept, but have not used the same
> standardization formalisms. We aren?t advocating for supporting ?all of?
> any standard but are rather attempting to support the use cases that have a
> compelling need today while aligning this with as many other encoding
> standards in this space as is practical. Hopefully that answers your
> question, sorry if it?s vague.
>
> 3) The google doc linked in my response contains the encoding we are
> proposing as a starting point for conversation: http://goo.gl/Kq9ASq <
> http://goo.gl/Kq9ASq> I want to stress, as a starting point for
> discussion. I expect that this proposal will change drastically before
> we?re done.
>
> 4) Absolutely envision tools doing what you say, convert to/from standard
> spatial formats and NetCDF-CF geometries. We intend to introduce an R and a
> Python implementation that does exactly as you say along with whatever form
> this standard takes in the end. R and Python were chosen as the team that
> brought this together are familiar with those two languages, additional
> implementations would be more than welcome.
>
> 5) We do include a ?geometry? featureType similar to the ?point?
> featureType. Thus our difficulty with what to do with the ?point?
> featureType. You are correct, there are lots of non timeSeries applications
> to be solved and this proposal does intend to support them (within the
> existing DSG constructs).
>
> Thanks for your questions, hopefully my answers close some gaps for you.
>
> - Dave
>
> > On Feb 3, 2017, at 10:47 AM, Bob Simons - NOAA Federal <
> bob.simons at noaa.gov> wrote:
> >
> > 1) There is a vague comment in the proposal about possibly changing the
> point featureType. Please don't, unless the changes don't affect current
> uses of Point. There are already 1000's of files that use it. If this new
> system offers an alternative, then fine, it's an alternative. One of the
> most important and useful features of a good standard is backwards
> compatibility.
> >
> > 2) You advocate "Implement the WKT approach using a NetCDF binary
> array." Is this system then an exact encoding of WKT, neither a subset nor
> a superset? "Simple Features" are often not simple.
> > If it is WKT (or something else), what is the standard you are following
> to describe the Simple Features (e.g., ISO/IEC 13249-3:2016 and ISO
> 19162:2015)?
> > Does your proposal deviate in any way from the standard's capabilities?
> > Do you advocate following the entire WKT standard, e.g., supporting all
> the feature types that WKT supports?
> >
> > 3) Since you are not using the WKT encoding, but creating your own,
> where is the definition of the encoding system you are using?
> >
> > 4) This is a little out of CF scope, but:
> > Do you envision tools, notably, netcdf-c/java, having a writer function
> that takes in WKT and encodes the information in a file, and having a
> reader function that reads the file and returns WKT? Or is it your plan
> that the encoding/ decoding is left to the user?
> >
> > 5) This proposal is for "Simple Features plus Time Series" (my phrase
> not yours). But aren't there lots of other uses of Simple Features? Will
> there be other proposals in the future for "Simple Features plus X" and
> "Simple Features plus Y"? If so, will CF eventually become a massive
> document where Simple Features are defined over and over again, but in
> different contexts? If so, wouldn't a better solution be to deal with
> Simple Features separately (as Postgres does by making a geometric data
> type?), and then add "Simple Features plus Time Series" as the first use of
> it?
> >
> > Thanks for answering these questions.
> > Please forgive me if I missed parts of your proposal that answer these
> questions.
> >
> >
> > On Thu, Feb 2, 2017 at 5:57 AM, <cf-metadata-request at cgd.ucar.edu
> <mailto:cf-metadata-request at cgd.ucar.edu>> wrote:
> > Date: Thu, 2 Feb 2017 07:57:36 -0600
> > From: David Blodgett <dblodgett at usgs.gov <mailto:dblodgett at usgs.gov>>
> > To: <cf-metadata at cgd.ucar.edu <mailto:cf-metadata at cgd.ucar.edu>>
> > Subject: [CF-metadata] Extension of Discrete Sampling Geometries for
> > Simple Features
> > Message-ID: <224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov <mailto:
> 224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov>>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Dear CF Community,
> >
> > We are pleased to submit this proposal for your consideration and
> review. The cover letter we've prepared below provides some background and
> explanation for the proposed approach. The google doc here <
> http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>> is an excerpt of the CF
> specification with track changes turned on. Permissions for the document
> allow any google user to comment, so feel free to comment and ask questions
> in line.
> >
> > Note that I?m sharing this with you with one issue unresolved. What to
> do with the point featureType? Our draft suggests that it is part of a new
> geometry featureType, but it could be that we leave it alone and introduce
> a geometry featureType. This may be a minor point of discussion, but we
> need to be clear that this is an issue that still needs to be resolved in
> the proposal.
> >
> > Thank you for your time and consideration.
> >
> > Best Regards,
> >
> > David Blodgett, Tim Whiteaker, and Ben Koziol
> >
> > Proposed Extension to NetCDF-CF for Simple Geometries
> >
> > Preface
> >
> > The proposed addition to NetCDF-CF introduced below is inspired by a
> pre-existing data model governed by OGC and ISO as ISO 19125-1. More
> information on Simple Features may be found here. <
> https://en.wikipedia.org/wiki/Simple_Features <https://en.wikipedia.org/
> wiki/Simple_Features>> To the knowledge of the authors, it is consistent
> with ISO 19125-1 but has not been specified using the formalisms of OGC or
> ISO. Language used attempts to hold true to NetCDF-CF semantics while not
> conflicting with the existing standards baseline. While this proposal does
> not support the entire scope of the the simple features ecosystem, it does
> support the core data types in most common use around the community.
> >
> > The other existing standard to mention is UGRID convention <
> http://ugrid-conventions.github.io/ugrid-conventions/ <
> http://ugrid-conventions.github.io/ugrid-conventions/>>. The authors have
> experience reading and writing UGRID and have designed the proposed
> structure in a way that is inspired by and consistent with it.
> >
> > Terms and Definitions
> >
> > (Taken from OGC 06-103r4 OpenGIS Implementation Specification for
> Geographic information - Simple feature access - Part 1: Common
> architecture <http://www.opengeospatial.org/standards/sfa <
> http://www.opengeospatial.org/standards/sfa>>.)
> >
> > Feature: Abstraction of real world phenomena - typically a geospatial
> abstraction with associated descriptive attributes.
> > Simple Feature: A feature with all geometric attributes described
> piecewise by straight line or planar interpolation between point sets.
> > Geometry (geometric complex): A set of disjoint geometric primitives -
> one or more points, lines, or polygons that form the spatial representation
> of a feature.
> > Introduction
> >
> > Discrete Sampling Geometries (DSGs) handle data from one (or a
> collection of) timeSeries (point), Trajectory, Profile, TrajectoryProfile
> or timeSeriesProfile geometries. Measurements are from a point (timeSeries
> and Profile) or points along a trajectory. In this proposal, we reuse the
> core DSG timeSeries type which provides support for basic time series use
> cases e.g., a timeSerieswhich is measured (or modeled) at a given point.
> >
> > Changes to Existing CF Specification
> >
> > In NetCDF-CF 1.7, Discrete Sampling Geometries separate dimensions and
> variables into two types ? instance and element <
> http://cfconventions.org/cf-conventions/cf-conventions.
> html#_collections_instances_and_elements <http://cfconventions.org/cf-
> conventions/cf-conventions.html#_collections_instances_and_elements>>.
> Instance refers to individual points, trajectories, profiles, etc. These
> would sometimes be referred to as features given that they are identified
> entities that can have associated attributes and be related to other
> entities. Element dimensions describe temporal or other dimensions to
> describe data on a per-instance basis. This proposal extends the DSG
> timeSeries featuretype <http://cfconventions.org/cf-
> conventions/cf-conventions.html#_features_and_feature_types <
> http://cfconventions.org/cf-conventions/cf-conventions.
> html#_features_and_feature_types>> such that the geospatial coordinates
> of the instances can be point, multi-point, line, multi-line, polygon, or
> multi-polyg
> on geometries. Rather than overload the DSG contiguous ragged array
> encoding, designed with timeseries in mind, a geometry ragged array
> encoding is introduced in a new section 9.3.5. See thi
> > s google doc for specific proposed changes. <http://goo.gl/Kq9ASq <
> http://goo.gl/Kq9ASq>>
> > Motivation
> >
> > DSGs have no system to define a geometry (polyline, polygon, etc., other
> than point) and an association with a time series that applies over that
> entire geometry e.g., The expected rainfall in this watershed polygon for
> some period of time is 10 mm. As suggested in the last paragraph of section
> 9.1, current practice is to assign a representative point or just use an ID
> and forgo spatial information within a NetCDF-CF file. In order to satisfy
> a number of environmental modeling use cases, we need a way to encode a
> geometry (point, line, polygon, multi-point, multi-line, or multi-polygon)
> that is the static spatial feature representation to which one or more
> timeSeries can be associated. In this proposal, we provide an encoding to
> define collections of simple feature geometries. It interfaces cleanly with
> the existing DSG specification, enabling DSGs and Simple Geometries to be
> used concurrently.
> >
> > Looking Forward
> >
> > This proposal is a compromise solution that attempts to stay consisten
> to CF ideals and fit within the structure of the existing specification
> with minimal disruption. Line and polygon data types often require variable
> length arrays. Development of this proposal has brought to light the need
> for a general abstraction for variable length arrays in NetCDF-CF. Such a
> general abstraction would necessarily be reusable for character arrays,
> ragged arrays of time series, and ragged arrays of geometry nodes, as well
> as any other ragged data structures that may come up in the future. This
> proposal does not introduce such a general ragged array abstraction but
> does not preclude such a development in the future.
> >
> > Three Alternative Approaches
> >
> > Respecting the human readability ideal of NetCDF-CF, the development of
> this proposal started from a human readable format for geometries known as
> Well Known Text <https://en.wikipedia.org/wiki/Well-known_text <
> https://en.wikipedia.org/wiki/Well-known_text>>. We considered three high
> level design approaches while developing this proposal.
> >
> > Direct use of Well-Known Text (WKT). In this approach, well known text
> strings would be encoded using character arrays following a contiguous
> ragged array approach to index the character array by geometry (or instance
> in DSG parlance).
> > Implement the WKT approach using a NetCDF binary array. In this
> approach, well known text separators (brackets, commas and spaces) for
> multipoint, multiline, multipolygon, and polygon holes, would be encoded as
> break type separator values like -1 for multiparts and -2 for holes.
> > Implement the fundamental dimensions of geometry data in NetCDF. In this
> approach, additional dimensions and variables along those dimensions would
> be introduced to represent geometries, geometry parts, geometry nodes, and
> unique (potentially shared) coordinate locations for nodes to reference.
> > Selected Approach
> >
> > The first approach was seen as too opaque to stay true to the CF ideal
> of complete self-description. The third approach seemed needlessly verbose
> and difficult to implement. The second approach was selected for the
> following reasons:
> >
> > The second approach is just as or more human-readable than the third.
> > Use of break values keeps geometries relatively atomic.
> > Will be familiar to developers who are familiar with the WKT geometry
> format.
> > Character arrays, which are needed for options one and three, are
> cumbersome to use in some programming languages in common use with NetCDF.
> > Break values replace the need for extraneous variables related to
> multi-part and polygon holes (interiors). Multi-part geometries are
> generally an exception and excessive instrumentation to support them should
> be discounted.
> > Example: Representation of WKT-Style Polygons in a NetCDF-3
> timeSeriesfeatureType
> >
> > Below is sample CDL demonstrating how polygons are encoded in NetCDF-3
> using a continuous ragged array-like encoding. There are three details to
> note in the example below.
> >
> > The attribute contiguous_ragged_dimension with value of a dimension in
> the file.
> > The geom_coordinates attribute with a value containing a space separated
> string of variable names.
> > The cf_role geometry_x_node and geometry_y_node.
> > These three attributes form a system to fully describe collections of
> multi-polygon feature geometries. Any variable that has the
> continuous_ragged_dimension attribute contains integers that indicate the
> 0-indexed starting position of each geometry along the instance dimension.
> Any variable that uses the dimension referenced in the
> continuous_ragged_dimension attribute can be interpreted using the values
> in the variable containing the contiguous_ragged_dimension attribute. The
> variables referenced in the geom_coordinates attribute describe spatial
> coordinates of geometries. These variables can also be identified by the
> cf_roles geometry_x_node and geometry_y_node. Note that the example below
> also includes a mechanism to handle multi-polygon features that also
> contain holes.
> >
> > netcdf multipolygon_example {
> > dimensions:
> > node = 47 ;
> > indices = 55 ;
> > instance = 3 ;
> > time = 5 ;
> > strlen = 5 ;
> > variables:
> > char instance_name(instance, strlen) ;
> > instance_name:cf_role = "timeseries_id" ;
> > int coordinate_index(indices) ;
> > coordinate_index:geom_type = "multipolygon" ;
> > coordinate_index:geom_coordinates = "x y" ;
> > coordinate_index:multipart_break_value = -1 ;
> > coordinate_index:hole_break_value = -2 ;
> > coordinate_index:outer_ring_order = "anticlockwise" ;
> > coordinate_index:closure_convention = "last_node_equals_first" ;
> > int coordinate_index_start(instance) ;
> > coordinate_index_start:long_name = "index of first coordinate in
> each instance geometry" ;
> > coordinate_index_start:contiguous_ragged_dimension = "indices" ;
> > double x(node) ;
> > x:units = "degrees_east" ;
> > x:standard_name = "longitude" ; // or projection_x_coordinate
> > X:cf_role = "geometry_x_node" ;
> > double y(node) ;
> > y:units = "degrees_north" ;
> > y:standard_name = ?latitude? ; // or projection_y_coordinate
> > y:cf_role = "geometry_y_node"
> > double someVariable(instance) ;
> > someVariable:long_name = "a variable describing a single-valued
> attribute of a polygon" ;
> > int time(time) ;
> > time:units = "days since 2000-01-01" ;
> > double someData(instance, time) ;
> > someData:coordinates = "time x y" ;
> > someData:featureType = "timeSeries" ;
> > // global attributes:
> > :Conventions = "CF-1.8" ;
> >
> > data:
> >
> > instance_name =
> > "flash",
> > "bang",
> > "pow" ;
> >
> > coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12,
> -2, 13, 14, 15, 16,
> > -1, 17, 18, 19, 20, -1, 21, 22, 23, 24, 25, 26, 27, 28, -1, 29, 30,
> 31, 32, 33,
> > 34, -2, 35, 36, 37, 38, 39, 40, 41, 42, -1, 43, 44, 45, 46 ;
> >
> > coordinate_index_start = 0, 30, 46 ;
> >
> > x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
> > 5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20,
> -30, -20, -20, -30, 30,
> > 45, 10, 30, 25, 50, 30, 25 ;
> >
> > y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25,
> 25, 29,
> > 25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35,
> -20, -15, -25, -20, 20,
> > 40, 40, 20, 5, 10, 15, 5 ;
> >
> > someVariable = 1, 2, 3 ;
> >
> > time = 1, 2, 3, 4, 5 ;
> >
> > someData =
> > 1, 2, 3, 4, 5,
> > 1, 2, 3, 4, 5,
> > 1, 2, 3, 4, 5 ;
> > }
> > How To Interpret
> >
> > Starting from the timeSeries variables:
> >
> > See CF-1.8 conventions.
> > See the timeSeries featureType.
> > Find the timeseries_id cf_role.
> > Find the coordinates attribute of data variables.
> > See that the variables indicated by the coordinates attribute have a
> cf_role geometry_x_nodeand geometry_y_node to determine that these are
> geometries according to this new specification.
> > Find the coordinate index variable with geom_coordinates that point to
> the nodes.
> > Find the variable with contiguous_ragged_dimension pointing to the
> dimension of the coordinate index variable to determine how to index into
> the coordinate index.
> > Iterate over polygons, parsing out geometries using the contiguous
> ragged start variable and coordinate index variable to interpret the
> coordinate data variables.
> > Or, without reference to timeSeries:
> >
> > See CF-1.8 conventions.
> > See the geom_type of multipolygon.
> > Find the variable with a contiguous_ragged_dimension matching the
> coordinate index variable?s dimension.
> > See the geom_coordinates of x y.
> > Using the contiguous ragged start variable found in 3 and the coordinate
> index variable found in 2, geometries can be parsed out of the coordinate
> index variable and parsed using the hole and break values in it.
> >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> attachments/20170202/4ce5b42f/attachment.html <
> http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> attachments/20170202/4ce5b42f/attachment.html>>
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > CF-metadata mailing list
> > CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
> >
> >
> > ------------------------------
> >
> > End of CF-metadata Digest, Vol 166, Issue 3
> > *******************************************
> >
> >
> >
> > --
> > Sincerely,
> >
> > Bob Simons
> > IT Specialist
> > Environmental Research Division
> > NOAA Southwest Fisheries Science Center
> > 99 Pacific St., Suite 255A (New!)
> > Monterey, CA 93940 (New!)
> > Phone: (831)333-9878 (New!)
> > Fax: (831)648-8440
> > Email: bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>
> >
> > The contents of this message are mine personally and
> > do not necessarily reflect any position of the
> > Government or the National Oceanic and Atmospheric Administration.
> > <>< <>< <>< <>< <>< <>< <>< <>< <><
> >
> > _______________________________________________
> > CF-metadata mailing list
> > CF-metadata at cgd.ucar.edu
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> attachments/20170203/4ff55def/attachment.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
> ------------------------------
>
> End of CF-metadata Digest, Vol 166, Issue 5
> *******************************************
>
>
>
>
>
> --
>
> Sincerely,
>
> Bob Simons
> IT Specialist
> Environmental Research Division
> NOAA Southwest Fisheries Science Center
> 99 Pacific St., Suite 255A (New!)
> Monterey, CA 93940 (New!)
> Phone: (831)333-9878 <(831)%20333-9878> (New!)
>
> Fax: (831)648-8440 <(831)%20648-8440>
> Email: bob.simons at noaa.gov
>
> The contents of this message are mine personally and
> do not necessarily reflect any position of the
> Government or the National Oceanic and Atmospheric Administration.
> <>< <>< <>< <>< <>< <>< <>< <>< <><
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
>
>
>
> --
>
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R (206) 526-6959 voice
> 7600 Sand Point Way NE (206) 526-6329 fax
> Seattle, WA 98115 (206) 526-6317 main reception
>
> Chris.Barker at noaa.gov
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170218/ce0c96f8/attachment.html>
Received on Sat Feb 18 2017 - 15:17:44 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒