⇐ ⇒

[CF-metadata] Extension of Discrete Sampling Geometries for Simple Features

From: Chris Barker <chris.barker>
Date: Wed, 22 Feb 2017 09:26:47 -0800

On Wed, Feb 22, 2017 at 7:23 AM, David Blodgett <dblodgett at usgs.gov> wrote:

> We will meet on google hangouts at 8am CT on March 7th If you?d like to be
> added to the calendar invite, please let me know.
>

Please invite me -- though it's pretty early for me -- aren't timezones
fun! Darn round earth....

And you do mean 8am UTC-6, (USA Central Standard Time), yes?

-Chris



> *NetCDF - Simple Geometries Discussion*
> Scheduled: Mar 7, 2017, 8:00 AM to 9:00 AM CT
> Hopefully google hangouts will work. Please use this url:
> https://plus.google.com/hangouts/_/calendar/ZGJsb2RnZXR0Lmgyb0BnbWFpbC5jb2
> 0.c192p3bskticchduvubks6ug44
>
> - Dave
>
> On Feb 18, 2017, at 4:17 PM, Blodgett, David <dblodgett at usgs.gov> wrote:
>
> Let's try this again. Sorry to spam everyone.
>
> http://doodle.com/poll/yaherucx2w3cd9y6
>
>>
>>
>>
>> On Feb 6, 2017, at 11:29 AM, David Blodgett <dblodgett at usgs.gov> wrote:
>>
>>
>>
>>
>> Dear CF,
>>
>>
>>
>>
>> I want to follow up on the conversation here with an alternative approach
>> suggested off list primarily between Jonathan and I. For this, I?m going to
>> focus on use cases satisfied and simplification of the proposal allowed by
>> not supporting those use cases. The changes below are largely driven by a
>> desire to better align this proposal with the technical details of the
>> prior art that is CF.
>>
>>
>>
>>
>> If we:
>>
>> 1) don?t support node sharing, we can remove the complication of node -
>> coordinate indexing / indirection, simplifying the proposal pretty
>> significantly.
>>
>> 2) don?t use ?break values? to indicate the separation between multi-part
>> geometries and polygon holes, we end up with a data model with an extra
>> dimension, but the NetCDF dimensions align with the natural dimensions of
>> the data.
>>
>> 3) use ?count? instead of a ?start pointer? approach, we are better
>> aligned with the existing DSG contiguous ragged array approach.
>>
>>
>>
>>
>> Coming back to the three directions we could take this proposal from my
>> cover letter on February 2nd.
>>
>> 1. Direct use of Well-Known Text (WKT). In this approach, well known
>> text strings would be encoded using character arrays following a contiguous
>> ragged array approach to index the character array by geometry (or instance
>> in DSG parlance).
>>
>>
>> 2. Implement the WKT approach using a NetCDF binary array. In this
>> approach, well known text separators (brackets, commas and spaces) for
>> multipoint, multiline, multipolygon, and polygon holes, would be encoded as
>> break type separator values like -1 for multiparts and -2 for holes.
>>
>>
>> 3. Implement the fundamental dimensions of geometry data in NetCDF.
>> In this approach, additional dimensions and variables along those
>> dimensions would be introduced to represent geometries, geometry parts,
>> geometry nodes, and unique (potentially shared) coordinate locations for
>> nodes to reference.
>>
>>
>> The alternative I?m outlining here moves in the direction of 3. We had
>> originally discounted it because it becomes very verbose and seems overly
>> complicated if support for coordinate sharing is a requirement. If the
>> three simplifications described above are used, then the third approach
>> seems more tenable.
>>
>>
>>
>>
>> Jonathan has also suggested that: (these are in reaction to the CDL in my
>> letter from February 2nd)
>>
>> 1) Rename geom_coordinates as node_coordinates, for consistency with
>> UGRID.
>>
>> 2) Omit node_dimension. This is redundant, since the dimension can be
>> found by
>>
>> examining the node coordinate variables.
>>
>> 3) Prescribe numerous ?codes? and assumptions in the specification
>> instead of letting them be described with attribute values.
>>
>> 4) It would be more consistent with CF and UGRID to use a single
>> container variable to hang all the topology/geometry information from.
>>
>>
>>
>>
>> Which I, personally, am happy to accept if others don?t object.
>>
>>
>>
>>
>> A couple other suggestions from Jonathan I want to discuss a bit more:
>>
>> 1) Rename geometry as topology and geom_type as topology_type.
>>
>> While I?d be open to something other than geom, topology is
>> odd. If this is really ?node_collection_topology_type? I guess I could
>> be convinced, but would be curious how people react to this. (Especially in
>> relation to UGRID)
>>
>> 2) This extension is more appropriate as an extension to the concept of
>> cell bounds than the addition of a complex time-invariate type of discrete
>> sampling geometry.
>>
>> Having just re-read the cell bounds chapter, I think it
>> would over complicate the cell bounds to include this material. My basic
>> issue here is that these geometries do not necessarily have a reference
>> location. They are, rather, first order entities that need to be treated as
>> such. That said, it makes sense that these geometries are not necessarily a
>> good fit for the original intent of Discrete Sampling Geometries. Jonathan
>> suggested they may belong in their own chapter, which may be a good
>> alternative? MY suggested CDL below might lead us in the direction of this
>> being a special type of auxiliary coordinate variable.
>>
>>
>>
>>
>> This alternative starts to look like the CDL pasted below.
>>
>>
>>
>>
>> Note that the issue of coordinates is sticking out like a sore thumb.
>> Below, I?ve attempted to reconcile Jonathan?s ideas regarding coordinates
>> with my thoughts about how these geometries are ?first order entities? that
>> don?t have a single representative x and y. The spatial coordinates can be
>> said to reside in the system of geometries described in the ?sf? container
>> variable? I realize this goes against the idea of coordinates a bit, but I
>> think it is holding with the spirit of the attribute?
>>
>>
>>
>>
>> Finally, I?m glad to continue answering questions and debating things via
>> the list to a point, but I think it would be in our interest to arrange a
>> telecom to discuss this stuff further with a list of interested parties.
>> Feel free to follow up on list, but for decision making, let?s not let this
>> rabbit hole go too deep. I?ll plan on letting this and the other recent
>> action on this proposal settle with people for a week or two then start to
>> bring together a conference call (or calls depending on time zones). Please
>> respond to me off list if you are interested in being part of a call to
>> discuss.
>>
>>
>>
>>
>> Regards,
>>
>>
>>
>>
>> - Dave
>>
>>
>>
>>
>> netcdf multipolygon_example {
>>
>> dimensions:
>>
>> node = 47 ;
>>
>> part = 9 ;
>>
>> instance = 3 ;
>>
>> time = 5 ;
>>
>> strlen = 5 ;
>>
>> variables:
>>
>> char instance_name(instance, strlen) ;
>>
>> instance_name:cf_role = "timeseries_id" ;
>>
>> double someVariable(instance) ;
>>
>> someVariable:long_name = "a variable describing a single-valued attribute of a polygon" ;
>>
>> someVariable:coordinates = "sf" ; // or "instance_name"?
>>
>> int time(time) ;
>>
>> time:units = "days since 2000-01-01" ;
>>
>> double someData(instance, time) ;
>>
>> someData:coordinates = "time sf" ; // or "time instance_name"?
>>
>> someData:featureType = "timeSeries" ;
>>
>> someData:geometry="sf";
>>
>> int sf; // containing variable -- datatype irrelevant because no data
>>
>> sf:geom_type = "multipolygon" ; // could be node_topology_type?
>>
>> sf:node_count_variable="node_count";
>>
>> sf:node_coordinates = "x y" ;
>>
>> sf:part_count = "part_node_count" ;
>>
>> sf:part_type = "part_type" ; // Note required unless polygons with holes present.
>>
>> sf:outer_ring_order = "anticlockwise" ; // not required if written in spec?
>>
>> sf:closure_convention = "last_node_equals_first" ; // not required if written in spec?
>>
>> sf:outer_type_code = 0 ; // not required if written in spec?
>>
>> sf:inner_type_code = 1 ; // not required if written in spec?
>>
>> int node_count(instance);
>>
>> node_count:long_name = ?count of coordinates in each instance geometry" ;
>>
>> int part_node_count(part) ;
>>
>> part_node_count:long_name = ?count of coordinates in each geometry part" ;
>>
>> int part_type(part) ;
>>
>> part_type:long_name = ?type of each geometry part" ;
>>
>> double x(node) ;
>>
>> x:units = "degrees_east" ;
>>
>> x:standard_name = "longitude" ; // or projection_x_coordinate
>>
>> X:cf_role = "geometry_x_node" ;
>>
>> double y(node) ;
>>
>> y:units = "degrees_north" ;
>>
>> y:standard_name = ?latitude? ; // or projection_y_coordinate
>>
>> y:cf_role = "geometry_y_node"
>>
>> // global attributes:
>>
>> :Conventions = "CF-1.8" ;
>>
>>
>>
>> data:
>>
>>
>>
>> instance_name =
>>
>> "flash",
>>
>> "bang",
>>
>> "pow" ;
>>
>>
>>
>> someVariable = 1, 2, 3 ;
>>
>>
>>
>> time = 1, 2, 3, 4, 5 ;
>>
>>
>>
>> someData =
>>
>> 1, 2, 3, 4, 5,
>>
>> 1, 2, 3, 4, 5,
>>
>> 1, 2, 3, 4, 5 ;
>>
>>
>>
>> node_count = 25, 15, 7 ;
>>
>>
>>
>> part_node_count = 5, 4, 4, 4, 4, 8, 6, 8, 4 ;
>>
>>
>>
>> part_type = 0, 1, 1, 1, 0, 0, 0, 1, 0 ;
>>
>>
>>
>> x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
>>
>> 5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20, -30, -20, -20, -30, 30,
>>
>> 45, 10, 30, 25, 50, 30, 25 ;
>>
>>
>>
>> y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29,
>>
>> 25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35, -20, -15, -25, -20, 20,
>>
>> 40, 40, 20, 5, 10, 15, 5 ;
>>
>> }
>>
>>
>>
>>
>>
>>
>>
>>
>> On Feb 4, 2017, at 8:07 AM, David Blodgett <dblodgett at usgs.gov> wrote:
>>
>>
>>
>>
>> Dear Chris,
>>
>>
>>
>>
>> Thanks for your thorough treatment of these issues. We have gone through
>> a similar thought process to arrive at the proposal we came up with. I?ll
>> answer as briefly as I can.
>>
>>
>>
>>
>> 1) how would you translate between netcdf geometries and, say geo JSON?
>>
>>
>>
>>
>> The thinking is that node coordinate sharing is optional. If the writer
>> wants to check or already knows that nodes share coordinates, then it?s
>> possible. Otherwise, it doesn?t have to be used. I?ve always felt that this
>> was important, but maybe not critical for a core NetCDF-CF data model. Some
>> offline conversation has led to an example that does not use it that may be
>> a good alternative, more on that later.
>>
>>
>>
>>
>> 2) Break Values
>>
>>
>>
>>
>> You really do have to hold your nose on the break values. The issue is
>> that you have to store that information somehow and it is almost worse to
>> create new variables to store the multi-part and hole/not hole information.
>> The alternative approach that?s forming up as mentioned above does break
>> the information out into additional variables but simplifies things
>> otherwise. In that case it doesn?t feel overly complex to me? so stay tuned
>> for more on this front.
>>
>>
>>
>>
>> 3) Ragged Indexing
>>
>>
>>
>>
>> Your thought process follows ours exactly. The key is that you either
>> have to create the ?pointer? array as a first order of business or loop
>> over the counts ad nauseam. I?m actually leaning toward the counts for two
>> reasons. First, the counts approach is already in CF so is a natural fit
>> and will be familiar to developers in this space. Second, the issue of 0 vs
>> 1 indexing is annoying. In our proposal, we settled on 0 indexing because
>> it aligns with the idea of an offset, but it is still annoying and some
>> applications would always have to adjust that pointer array as a first
>> order of business.
>>
>>
>>
>>
>> On to Bob?s comments.
>>
>>
>>
>>
>> Regarding aligning with other data models / encodings, I guess this needs
>> to be unpacked a bit.
>>
>>
>>
>>
>> 1) In this setting, simple features is a data model, not an encoding. An
>> encoding can implement part or all of a data model as is needed by the use
>> case(s) at hand. There is no problem with partial implementations you still
>> get interoperability for the intended use cases.
>>
>> 2) Attempting to align with other encoding standards UGRID and NetCDF-CF
>> are the primary ones here, is simply to keep the implementation patterns
>> similar and familiar. This may be a fools errand, but is presumably good
>> for adoptability and consistency.
>>
>> So, I don?t see a problem with implementing important simple features
>> types in a way that aligns with the way the existing community standards
>> work.
>>
>>
>>
>>
>> I don?t see this as ignoring existing standards at all. There is no open
>> community standard for binary encoding of geometries and related data that
>> passes the CF requirements of human readability and self-description. We
>> are adopting the appropriate data model and suggesting a new encoding that
>> will solve a lot of problems in the environmental modeling space.
>>
>>
>>
>>
>> As we?ve discussed before, your "different approach? sounds great, but
>> seems like an exercise for a future effort that doesn?t attempt to align
>> with CF 1.7. Maybe what you suggest is a path forward for variable length
>> arrays in the CF 2.0 ?vision in the mist?, but I don?t see it as a tenable
>> solution for CF 1.*.
>>
>>
>>
>>
>> Best Regards,
>>
>>
>>
>>
>> - Dave
>>
>>
>>
>>
>>
>>
>> On Feb 3, 2017, at 3:31 PM, Chris Barker <chris.barker at noaa.gov> wrote:
>>
>>
>>
>>
>> a few thoughts. First, I think there are three core "issues" that need to
>> be resolved:
>>
>>
>>
>>
>> 1) Coordinate indexing (indirection)
>>
>> the question of whether you have an array of "vertices" that the geomotry
>> types index into to get thier data:
>>
>>
>>
>>
>> Advantages:
>>
>> - if a number of geometries share a lot of vertices, it can be more
>> efficient
>>
>> - the relationship between geometries that share vertices (i.e. polygons
>> that share a boundary) etc. is well defined. you dopnt need to check for
>> closeness, and maybe have a tolerance, etc.
>>
>>
>>
>>
>> These were absolutely critical for UGRID for example -- a UGRID mesh is a
>> single thing", NOT a collection of polygons that happen to share some
>> vertices.
>>
>>
>>
>>
>> Disadvantages:
>>
>> - if the geometries do not share many vertices, it is less efficient.
>>
>> - there are additional code complications in "getting" the vertices of
>> the given geometry
>>
>> - it does not match the OGC data model.
>>
>>
>>
>>
>> My 0.02 -- given my use cases, I tend to want teh advantages -- but I
>> don't know that that's a typical use case. And I think it's a really good
>> idea to keep with the OGS data model where possible -- i.e. e able to
>> translate from netcdf to, say, geoJSON as losslessly as possible. Given
>> that I think it's probably a better idea not to have the indirection.
>>
>>
>>
>>
>> However (to equivocate) perhaps the types of information people are
>> likely to want to store in netcdf are a subset of what the OGC
>> standards are designed for -- and for those use-cases, maybe shared
>> vertices are critical.
>>
>>
>>
>>
>> One way to think about it -- how would you translate between netcdf
>> geometries and, say geo JSON:
>>
>> - nc => geojson would lose the shared index info.
>>
>> - geojson => nc -- would you try to reconstruct the shared vertices??
>> I"m thinking that would be a bit dangerous in the general case, because you
>> are adding information that you don't know is true -- are these a shared
>> vertex or two that just happen to be at the same location?
>>
>>
>>
>>
>> > > Break values
>>
>>
>>
>>
>> I don't really like break values as an approach, but with netcdf any
>> option will be ugly one way or another. So keeping with the WKT approach
>> makes sense to me. Either way you'll need custom code to unpack it. (BTW --
>> what does WellKnownBinary do?)
>>
>>
>>
>>
>> > > Ragged indexing
>>
>>
>>
>>
>> There are two "natural" ways to represent a ragged array:
>>
>>
>>
>>
>> (a) store the length of each "row"
>>
>> (b) store the index to the beginning (or end) or each "row"
>>
>>
>>
>>
>> CF already uses (a). However, working with it, I'm pretty convinced that
>> it's the "wrong" choice:
>>
>>
>>
>>
>> If you want to know how long a given row is, that is really easy with
>> (a), and almost as easy with (b) (involves two indexes and a subtraction)
>>
>>
>>
>>
>> However, if you want to extract a particular row: (b) makes this really
>> easy -- you simply access the slice of the array you want. with (a) you
>> need to loop through the entire "length_of_rows" array (up to the row of
>> interest) and add up the values to find the slice you need. not a huge
>> issue, but it is an issue. In fact, in my code to read ragged arrays in
>> netcdf, the first thing I do is pre-compute the index-to-each-row, so I can
>> then use that to access individual rows for future access -- if you are
>> accessing via OpenDAP -- that's particular helpful.
>>
>>
>>
>>
>> So -- (b) is clearly (to me) the "best" way to do it -- but is it worth
>> introducing a second way to handle ragged arrays in CF? I would think yes,
>> but that would be offset if:
>>
>>
>>
>>
>> - There is a bunch of existing library code that transparently handles
>> ragged arrays in netcdf (does netcdfJava have something? I'm pretty sure
>> Python doesn't -- certainly not in netCDF4)
>>
>>
>>
>>
>> - That that existing lib code would be advantageous to leverage for code
>> reading features: I suspect that there will have to be enough custom code
>> that the ragged array bits are going to be the least of it.
>>
>>
>>
>>
>> So I'm for the "new" way of representing ragged arrays
>>
>>
>>
>>
>> -CHB
>>
>>
>>
>>
>>
>>
>> On Fri, Feb 3, 2017 at 11:41 AM, Bob Simons - NOAA Federal <
>> bob.simons at noaa.gov> wrote:
>>
>> Then, isn't this proposal just the first step in the creation of a new
>> model and a new encoding of Simple Features, one that is "align[ed] ...
>> with as many other encoding standards in this space as is practical"? In
>> other words, yet another standard for Simple Features?
>>
>>
>>
>>
>> If so, it seems risky to me to take just the first (easy?) step "to
>> support the use cases that have a compelling need today" and not solve the
>> entire problem. I know the CF way is to just solve real, current needs, but
>> in this case it seems to risk a head slap moment in the future when we
>> realize that, in order to deal with some new simple feature variant, we
>> should have done things differently from the beginning?
>>
>>
>>
>>
>> And it seems odd to reject existing standards that have been so
>> painstakingly hammered out, in favor of starting the process all over
>> again. We follow existing standards for other things (e.g., IEEE-754 for
>> representing floating point numbers in binary files), why can't we follow
>> an existing Simple Features standard?
>>
>>
>>
>>
>> ---
>>
>> Rather than just be a naysayer, let me suggest a very different
>> alternative:
>>
>>
>>
>>
>> There are several projects in the CF realm (e.g., this Simple Features
>> project, Discrete Sampling Geometry (DSG), true variable-length Strings,
>> ugrid(?)) which share a common underlying problem: how to deal with
>> variable-length multidimensional arrays: a[b][c], where the length of the c
>> dimension may be different for different b indices.
>>
>> DSG solved this (5 different ways!), but only for DSG.
>>
>> The Simple Features proposal seeks to solve the problem for Simple
>> Features.
>>
>> We still have no support for Unicode variable-length Strings.
>>
>>
>>
>>
>> Instead of continuing to solve the variable-length problem a different
>> way every time we confront it, shouldn't we solve it once, with one small
>> addition to the standard, and then use that solution repeatedly?
>>
>> The solution could be a simple variant of one of the DSG solutions, but
>> generalized so that it could be used in different situations.
>>
>> An encoding standard and built-in support for variable-length data arrays
>> in netcdf-java/c would solve a lot of problems, now and in the future.
>>
>> Some work on this is already done: I think the netcdf-java API already
>> supports variable-length arrays when reading netcdf-4 files.
>>
>> For Simple Features, the problem would reduce to: store the feature
>> (using some specified existing standard like WKT or WKB) in a
>> variable-length array.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Feb 3, 2017 at 9:07 AM, <cf-metadata-request at cgd.ucar.edu> wrote:
>>
>> Date: Fri, 3 Feb 2017 11:07:00 -0600
>> From: David Blodgett <dblodgett at usgs.gov>
>> To: Bob Simons - NOAA Federal <bob.simons at noaa.gov>
>> Cc: CF Metadata <cf-metadata at cgd.ucar.edu>
>> Subject: Re: [CF-metadata] Extension of Discrete Sampling Geometries
>> for Simple Features
>> Message-ID: <8EE85E65-2815-4720-90FC-13C72D3C7952 at usgs.gov>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Dear Bob,
>>
>> I?ll just take these in line.
>>
>> 1) noted. We have been trying to figure out what to do with the point
>> featureType and I think leaving it more or less alone is a viable path
>> forward.
>>
>> 2) This is not an exact replica of WKT, but rather a similar approach to
>> WKT. As I stated, we have followed the ISO simple features data model and
>> well known text feature types in concept, but have not used the same
>> standardization formalisms. We aren?t advocating for supporting ?all of?
>> any standard but are rather attempting to support the use cases that have a
>> compelling need today while aligning this with as many other encoding
>> standards in this space as is practical. Hopefully that answers your
>> question, sorry if it?s vague.
>>
>> 3) The google doc linked in my response contains the encoding we are
>> proposing as a starting point for conversation: http://goo.gl/Kq9ASq <
>> http://goo.gl/Kq9ASq> I want to stress, as a starting point for
>> discussion. I expect that this proposal will change drastically before
>> we?re done.
>>
>> 4) Absolutely envision tools doing what you say, convert to/from standard
>> spatial formats and NetCDF-CF geometries. We intend to introduce an R and a
>> Python implementation that does exactly as you say along with whatever form
>> this standard takes in the end. R and Python were chosen as the team that
>> brought this together are familiar with those two languages, additional
>> implementations would be more than welcome.
>>
>> 5) We do include a ?geometry? featureType similar to the ?point?
>> featureType. Thus our difficulty with what to do with the ?point?
>> featureType. You are correct, there are lots of non timeSeries applications
>> to be solved and this proposal does intend to support them (within the
>> existing DSG constructs).
>>
>> Thanks for your questions, hopefully my answers close some gaps for you.
>>
>> - Dave
>>
>> > On Feb 3, 2017, at 10:47 AM, Bob Simons - NOAA Federal <
>> bob.simons at noaa.gov> wrote:
>> >
>> > 1) There is a vague comment in the proposal about possibly changing the
>> point featureType. Please don't, unless the changes don't affect current
>> uses of Point. There are already 1000's of files that use it. If this new
>> system offers an alternative, then fine, it's an alternative. One of the
>> most important and useful features of a good standard is backwards
>> compatibility.
>> >
>> > 2) You advocate "Implement the WKT approach using a NetCDF binary
>> array." Is this system then an exact encoding of WKT, neither a subset nor
>> a superset? "Simple Features" are often not simple.
>> > If it is WKT (or something else), what is the standard you are
>> following to describe the Simple Features (e.g., ISO/IEC 13249-3:2016 and
>> ISO 19162:2015)?
>> > Does your proposal deviate in any way from the standard's capabilities?
>> > Do you advocate following the entire WKT standard, e.g., supporting all
>> the feature types that WKT supports?
>> >
>> > 3) Since you are not using the WKT encoding, but creating your own,
>> where is the definition of the encoding system you are using?
>> >
>> > 4) This is a little out of CF scope, but:
>> > Do you envision tools, notably, netcdf-c/java, having a writer function
>> that takes in WKT and encodes the information in a file, and having a
>> reader function that reads the file and returns WKT? Or is it your plan
>> that the encoding/ decoding is left to the user?
>> >
>> > 5) This proposal is for "Simple Features plus Time Series" (my phrase
>> not yours). But aren't there lots of other uses of Simple Features? Will
>> there be other proposals in the future for "Simple Features plus X" and
>> "Simple Features plus Y"? If so, will CF eventually become a massive
>> document where Simple Features are defined over and over again, but in
>> different contexts? If so, wouldn't a better solution be to deal with
>> Simple Features separately (as Postgres does by making a geometric data
>> type?), and then add "Simple Features plus Time Series" as the first use of
>> it?
>> >
>> > Thanks for answering these questions.
>> > Please forgive me if I missed parts of your proposal that answer these
>> questions.
>> >
>> >
>> > On Thu, Feb 2, 2017 at 5:57 AM, <cf-metadata-request at cgd.ucar.edu
>> <mailto:cf-metadata-request at cgd.ucar.edu>> wrote:
>> > Date: Thu, 2 Feb 2017 07:57:36 -0600
>> > From: David Blodgett <dblodgett at usgs.gov <mailto:dblodgett at usgs.gov>>
>> > To: <cf-metadata at cgd.ucar.edu <mailto:cf-metadata at cgd.ucar.edu>>
>> > Subject: [CF-metadata] Extension of Discrete Sampling Geometries for
>> > Simple Features
>> > Message-ID: <224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov <mailto:
>> 224C2828-7212-449F-8C2C-97D903F6BE1E at usgs.gov>>
>> > Content-Type: text/plain; charset="utf-8"
>> >
>> > Dear CF Community,
>> >
>> > We are pleased to submit this proposal for your consideration and
>> review. The cover letter we've prepared below provides some background and
>> explanation for the proposed approach. The google doc here <
>> http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>> is an excerpt of the CF
>> specification with track changes turned on. Permissions for the document
>> allow any google user to comment, so feel free to comment and ask questions
>> in line.
>> >
>> > Note that I?m sharing this with you with one issue unresolved. What to
>> do with the point featureType? Our draft suggests that it is part of a new
>> geometry featureType, but it could be that we leave it alone and introduce
>> a geometry featureType. This may be a minor point of discussion, but we
>> need to be clear that this is an issue that still needs to be resolved in
>> the proposal.
>> >
>> > Thank you for your time and consideration.
>> >
>> > Best Regards,
>> >
>> > David Blodgett, Tim Whiteaker, and Ben Koziol
>> >
>> > Proposed Extension to NetCDF-CF for Simple Geometries
>> >
>> > Preface
>> >
>> > The proposed addition to NetCDF-CF introduced below is inspired by a
>> pre-existing data model governed by OGC and ISO as ISO 19125-1. More
>> information on Simple Features may be found here. <
>> https://en.wikipedia.org/wiki/Simple_Features <
>> https://en.wikipedia.org/wiki/Simple_Features>> To the knowledge of the
>> authors, it is consistent with ISO 19125-1 but has not been specified using
>> the formalisms of OGC or ISO. Language used attempts to hold true to
>> NetCDF-CF semantics while not conflicting with the existing standards
>> baseline. While this proposal does not support the entire scope of the the
>> simple features ecosystem, it does support the core data types in most
>> common use around the community.
>> >
>> > The other existing standard to mention is UGRID convention <
>> http://ugrid-conventions.github.io/ugrid-conventions/ <
>> http://ugrid-conventions.github.io/ugrid-conventions/>>. The authors
>> have experience reading and writing UGRID and have designed the proposed
>> structure in a way that is inspired by and consistent with it.
>> >
>> > Terms and Definitions
>> >
>> > (Taken from OGC 06-103r4 OpenGIS Implementation Specification for
>> Geographic information - Simple feature access - Part 1: Common
>> architecture <http://www.opengeospatial.org/standards/sfa <
>> http://www.opengeospatial.org/standards/sfa>>.)
>> >
>> > Feature: Abstraction of real world phenomena - typically a geospatial
>> abstraction with associated descriptive attributes.
>> > Simple Feature: A feature with all geometric attributes described
>> piecewise by straight line or planar interpolation between point sets.
>> > Geometry (geometric complex): A set of disjoint geometric primitives -
>> one or more points, lines, or polygons that form the spatial representation
>> of a feature.
>> > Introduction
>> >
>> > Discrete Sampling Geometries (DSGs) handle data from one (or a
>> collection of) timeSeries (point), Trajectory, Profile, TrajectoryProfile
>> or timeSeriesProfile geometries. Measurements are from a point (timeSeries
>> and Profile) or points along a trajectory. In this proposal, we reuse the
>> core DSG timeSeries type which provides support for basic time series use
>> cases e.g., a timeSerieswhich is measured (or modeled) at a given point.
>> >
>> > Changes to Existing CF Specification
>> >
>> > In NetCDF-CF 1.7, Discrete Sampling Geometries separate dimensions and
>> variables into two types ? instance and element <
>> http://cfconventions.org/cf-conventions/cf-conventions.html
>> #_collections_instances_and_elements <http://cfconventions.org/cf-c
>> onventions/cf-conventions.html#_collections_instances_and_elements>>.
>> Instance refers to individual points, trajectories, profiles, etc. These
>> would sometimes be referred to as features given that they are identified
>> entities that can have associated attributes and be related to other
>> entities. Element dimensions describe temporal or other dimensions to
>> describe data on a per-instance basis. This proposal extends the DSG
>> timeSeries featuretype <http://cfconventions.org/cf-c
>> onventions/cf-conventions.html#_features_and_feature_types <
>> http://cfconventions.org/cf-conventions/cf-conventions.html
>> #_features_and_feature_types>> such that the geospatial coordinates of
>> the instances can be point, multi-point, line, multi-line, polygon, or
>> multi-polyg
>> on geometries. Rather than overload the DSG contiguous ragged array
>> encoding, designed with timeseries in mind, a geometry ragged array
>> encoding is introduced in a new section 9.3.5. See thi
>> > s google doc for specific proposed changes. <http://goo.gl/Kq9ASq <
>> http://goo.gl/Kq9ASq>>
>> > Motivation
>> >
>> > DSGs have no system to define a geometry (polyline, polygon, etc.,
>> other than point) and an association with a time series that applies over
>> that entire geometry e.g., The expected rainfall in this watershed polygon
>> for some period of time is 10 mm. As suggested in the last paragraph of
>> section 9.1, current practice is to assign a representative point or just
>> use an ID and forgo spatial information within a NetCDF-CF file. In order
>> to satisfy a number of environmental modeling use cases, we need a way to
>> encode a geometry (point, line, polygon, multi-point, multi-line, or
>> multi-polygon) that is the static spatial feature representation to which
>> one or more timeSeries can be associated. In this proposal, we provide an
>> encoding to define collections of simple feature geometries. It interfaces
>> cleanly with the existing DSG specification, enabling DSGs and Simple
>> Geometries to be used concurrently.
>> >
>> > Looking Forward
>> >
>> > This proposal is a compromise solution that attempts to stay consisten
>> to CF ideals and fit within the structure of the existing specification
>> with minimal disruption. Line and polygon data types often require variable
>> length arrays. Development of this proposal has brought to light the need
>> for a general abstraction for variable length arrays in NetCDF-CF. Such a
>> general abstraction would necessarily be reusable for character arrays,
>> ragged arrays of time series, and ragged arrays of geometry nodes, as well
>> as any other ragged data structures that may come up in the future. This
>> proposal does not introduce such a general ragged array abstraction but
>> does not preclude such a development in the future.
>> >
>> > Three Alternative Approaches
>> >
>> > Respecting the human readability ideal of NetCDF-CF, the development of
>> this proposal started from a human readable format for geometries known as
>> Well Known Text <https://en.wikipedia.org/wiki/Well-known_text <
>> https://en.wikipedia.org/wiki/Well-known_text>>. We considered three
>> high level design approaches while developing this proposal.
>> >
>> > Direct use of Well-Known Text (WKT). In this approach, well known text
>> strings would be encoded using character arrays following a contiguous
>> ragged array approach to index the character array by geometry (or instance
>> in DSG parlance).
>> > Implement the WKT approach using a NetCDF binary array. In this
>> approach, well known text separators (brackets, commas and spaces) for
>> multipoint, multiline, multipolygon, and polygon holes, would be encoded as
>> break type separator values like -1 for multiparts and -2 for holes.
>> > Implement the fundamental dimensions of geometry data in NetCDF. In
>> this approach, additional dimensions and variables along those dimensions
>> would be introduced to represent geometries, geometry parts, geometry
>> nodes, and unique (potentially shared) coordinate locations for nodes to
>> reference.
>> > Selected Approach
>> >
>> > The first approach was seen as too opaque to stay true to the CF ideal
>> of complete self-description. The third approach seemed needlessly verbose
>> and difficult to implement. The second approach was selected for the
>> following reasons:
>> >
>> > The second approach is just as or more human-readable than the third.
>> > Use of break values keeps geometries relatively atomic.
>> > Will be familiar to developers who are familiar with the WKT geometry
>> format.
>> > Character arrays, which are needed for options one and three, are
>> cumbersome to use in some programming languages in common use with NetCDF.
>> > Break values replace the need for extraneous variables related to
>> multi-part and polygon holes (interiors). Multi-part geometries are
>> generally an exception and excessive instrumentation to support them should
>> be discounted.
>> > Example: Representation of WKT-Style Polygons in a NetCDF-3
>> timeSeriesfeatureType
>> >
>> > Below is sample CDL demonstrating how polygons are encoded in NetCDF-3
>> using a continuous ragged array-like encoding. There are three details to
>> note in the example below.
>> >
>> > The attribute contiguous_ragged_dimension with value of a dimension in
>> the file.
>> > The geom_coordinates attribute with a value containing a space
>> separated string of variable names.
>> > The cf_role geometry_x_node and geometry_y_node.
>> > These three attributes form a system to fully describe collections of
>> multi-polygon feature geometries. Any variable that has the
>> continuous_ragged_dimension attribute contains integers that indicate the
>> 0-indexed starting position of each geometry along the instance dimension.
>> Any variable that uses the dimension referenced in the
>> continuous_ragged_dimension attribute can be interpreted using the values
>> in the variable containing the contiguous_ragged_dimension attribute. The
>> variables referenced in the geom_coordinates attribute describe spatial
>> coordinates of geometries. These variables can also be identified by the
>> cf_roles geometry_x_node and geometry_y_node. Note that the example below
>> also includes a mechanism to handle multi-polygon features that also
>> contain holes.
>> >
>> > netcdf multipolygon_example {
>> > dimensions:
>> > node = 47 ;
>> > indices = 55 ;
>> > instance = 3 ;
>> > time = 5 ;
>> > strlen = 5 ;
>> > variables:
>> > char instance_name(instance, strlen) ;
>> > instance_name:cf_role = "timeseries_id" ;
>> > int coordinate_index(indices) ;
>> > coordinate_index:geom_type = "multipolygon" ;
>> > coordinate_index:geom_coordinates = "x y" ;
>> > coordinate_index:multipart_break_value = -1 ;
>> > coordinate_index:hole_break_value = -2 ;
>> > coordinate_index:outer_ring_order = "anticlockwise" ;
>> > coordinate_index:closure_convention = "last_node_equals_first" ;
>> > int coordinate_index_start(instance) ;
>> > coordinate_index_start:long_name = "index of first coordinate in
>> each instance geometry" ;
>> > coordinate_index_start:contiguous_ragged_dimension = "indices" ;
>> > double x(node) ;
>> > x:units = "degrees_east" ;
>> > x:standard_name = "longitude" ; // or projection_x_coordinate
>> > X:cf_role = "geometry_x_node" ;
>> > double y(node) ;
>> > y:units = "degrees_north" ;
>> > y:standard_name = ?latitude? ; // or projection_y_coordinate
>> > y:cf_role = "geometry_y_node"
>> > double someVariable(instance) ;
>> > someVariable:long_name = "a variable describing a single-valued
>> attribute of a polygon" ;
>> > int time(time) ;
>> > time:units = "days since 2000-01-01" ;
>> > double someData(instance, time) ;
>> > someData:coordinates = "time x y" ;
>> > someData:featureType = "timeSeries" ;
>> > // global attributes:
>> > :Conventions = "CF-1.8" ;
>> >
>> > data:
>> >
>> > instance_name =
>> > "flash",
>> > "bang",
>> > "pow" ;
>> >
>> > coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12,
>> -2, 13, 14, 15, 16,
>> > -1, 17, 18, 19, 20, -1, 21, 22, 23, 24, 25, 26, 27, 28, -1, 29, 30,
>> 31, 32, 33,
>> > 34, -2, 35, 36, 37, 38, 39, 40, 41, 42, -1, 43, 44, 45, 46 ;
>> >
>> > coordinate_index_start = 0, 30, 46 ;
>> >
>> > x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
>> > 5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45,
>> -20, -30, -20, -20, -30, 30,
>> > 45, 10, 30, 25, 50, 30, 25 ;
>> >
>> > y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25,
>> 25, 29,
>> > 25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20,
>> -35, -20, -15, -25, -20, 20,
>> > 40, 40, 20, 5, 10, 15, 5 ;
>> >
>> > someVariable = 1, 2, 3 ;
>> >
>> > time = 1, 2, 3, 4, 5 ;
>> >
>> > someData =
>> > 1, 2, 3, 4, 5,
>> > 1, 2, 3, 4, 5,
>> > 1, 2, 3, 4, 5 ;
>> > }
>> > How To Interpret
>> >
>> > Starting from the timeSeries variables:
>> >
>> > See CF-1.8 conventions.
>> > See the timeSeries featureType.
>> > Find the timeseries_id cf_role.
>> > Find the coordinates attribute of data variables.
>> > See that the variables indicated by the coordinates attribute have a
>> cf_role geometry_x_nodeand geometry_y_node to determine that these are
>> geometries according to this new specification.
>> > Find the coordinate index variable with geom_coordinates that point to
>> the nodes.
>> > Find the variable with contiguous_ragged_dimension pointing to the
>> dimension of the coordinate index variable to determine how to index into
>> the coordinate index.
>> > Iterate over polygons, parsing out geometries using the contiguous
>> ragged start variable and coordinate index variable to interpret the
>> coordinate data variables.
>> > Or, without reference to timeSeries:
>> >
>> > See CF-1.8 conventions.
>> > See the geom_type of multipolygon.
>> > Find the variable with a contiguous_ragged_dimension matching the
>> coordinate index variable?s dimension.
>> > See the geom_coordinates of x y.
>> > Using the contiguous ragged start variable found in 3 and the
>> coordinate index variable found in 2, geometries can be parsed out of the
>> coordinate index variable and parsed using the hole and break values in it.
>> >
>> > -------------- next part --------------
>> > An HTML attachment was scrubbed...
>> > URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachmen
>> ts/20170202/4ce5b42f/attachment.html <http://mailman.cgd.ucar.edu/p
>> ipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html>>
>> >
>> > ------------------------------
>> >
>> > Subject: Digest Footer
>> >
>> > _______________________________________________
>> > CF-metadata mailing list
>> > CF-metadata at cgd.ucar.edu <mailto:CF-metadata at cgd.ucar.edu>
>> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>> >
>> >
>> > ------------------------------
>> >
>> > End of CF-metadata Digest, Vol 166, Issue 3
>> > *******************************************
>> >
>> >
>> >
>> > --
>> > Sincerely,
>> >
>> > Bob Simons
>> > IT Specialist
>> > Environmental Research Division
>> > NOAA Southwest Fisheries Science Center
>> > 99 Pacific St., Suite 255A (New!)
>> > Monterey, CA 93940 (New!)
>> > Phone: (831)333-9878 (New!)
>> > Fax: (831)648-8440
>> > Email: bob.simons at noaa.gov <mailto:bob.simons at noaa.gov>
>> >
>> > The contents of this message are mine personally and
>> > do not necessarily reflect any position of the
>> > Government or the National Oceanic and Atmospheric Administration.
>> > <>< <>< <>< <>< <>< <>< <>< <>< <><
>> >
>> > _______________________________________________
>> > CF-metadata mailing list
>> > CF-metadata at cgd.ucar.edu
>> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachmen
>> ts/20170203/4ff55def/attachment.html>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>>
>> ------------------------------
>>
>> End of CF-metadata Digest, Vol 166, Issue 5
>> *******************************************
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Sincerely,
>>
>> Bob Simons
>> IT Specialist
>> Environmental Research Division
>> NOAA Southwest Fisheries Science Center
>> 99 Pacific St., Suite 255A (New!)
>> Monterey, CA 93940 (New!)
>> Phone: (831)333-9878 <(831)%20333-9878> (New!)
>>
>> Fax: (831)648-8440 <(831)%20648-8440>
>> Email: bob.simons at noaa.gov
>>
>> The contents of this message are mine personally and
>> do not necessarily reflect any position of the
>> Government or the National Oceanic and Atmospheric Administration.
>> <>< <>< <>< <>< <>< <>< <>< <>< <><
>>
>>
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R (206) 526-6959 voice
>> 7600 Sand Point Way NE (206) 526-6329 fax
>> Seattle, WA 98115 (206) 526-6317 main reception
>>
>> Chris.Barker at noaa.gov
>>
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>


-- 
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170222/810ac604/attachment.html>
Received on Wed Feb 22 2017 - 10:26:47 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒