Opened 10 years ago

Closed 5 years ago

#69 closed enhancement (fixed)

Specification of Coordinate Reference System properties in Well-Known Text format

Reported by: pbentley Owned by: cf-conventions@…
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: Cc: pbentley

Description

1. Title

Specification of Coordinate Reference System properties in Well-Known Text format

2. Moderator

Mark Hedley

3. Requirement

3.1. Data producers and software application developers desire a mechanism for specifying coordinate reference system (CRS) properties that are not covered by the existing set of CF grid mapping attributes.

3.2. Because the conceptual model for coordinate reference systems is both large and complex it is considered impractical to devise CF attributes for all of the potential CRS properties which might need to be encoded as metadata attributes in netCDF files. Consequently there is a requirement for such CRS properties to be specified in a compact notational format, preferably a format that is already in widespread use, either as a de facto or de jure standard.

3.3. The proposed method for specifying additional CRS properties should be optional and should act as an adjunct, or supplement, to the existing grid mapping attributes defined in the latest CF conventions.

3.4. The proposed method, if applied to existing CF-compliant netCDF files, should not render such files non-compliant.

3.5. If a given CRS property is defined both by the proposed method and also by one of the existing CF grid mapping attributes, and the property values differ, then the latter value shall take precedence.

4. Initial Statement of Technical Proposal

4.1 Background (Informative)

The current proposal entails the addition of an optional new grid mapping attribute to the CF conventions. The proposed name of the attribute is crs_wkt, signifying coordinate reference system well-known text (commonly abbreviated to CRS WKT). The CRS WKT format is widely recognised and used within the geoscience software community. As such it represents the obvious candidate for encoding information about a variety of coordinate reference system parameters.

The value of the crs_wkt attribute must be a text string, i.e. netCDF data type NC_CHAR or, as of netCDF-4, NC_STRING. The text content of the new attribute must adhere to the CRS WKT syntax, as specified in [1] (and prior to that ![2]). The crs_wkt attribute should only be attached to grid mapping variables.

The purpose of the new crs_wkt attribute is to enable multiple CRS properties to be specified in compact form in a single netCDF attribute. Importantly, the crs_wkt attribute is intended to act as a supplement to existing CF grid mapping attributes - it is not intended to replace them. This is in order to maintain backwards-compatibility with existing netCDF software clients, which naturally will have no knowledge of the proposed new attribute and will expect to find the familiar grid mapping attributes.

The idea of using WKT syntax to encode CRS properties within netCDF files has previously been discussed under CF proposals #9 and #18. Although use of the WKT method wasn't endorsed at the time of those proposals, recently there have been increased expressions of interest on the CF mailing list to re-evaluate this method as an efficient - and non-disruptive - mechanism for specifying CRS's in more detail.

The origin of the WKT specification within the oil and gas exploration domain means that it primarily, though by no means exclusively, focussed on the specification of earth-centric horizontal coordinate systems. Consequently the WKT format may not provide the expressive power required, for example, to define some of the more specialised vertical coordinate systems utilised within the metocean domain. Nonetheless, for its principal domain of application - horizontal coordinate systems - it is believed to represent the best available encoding mechanism.

It is acknowledged that there will be occasions when CRS property values will be duplicated in both the new compound crs_wkt attribute and in single-property CF grid mapping attributes. In such cases the onus is on data producers, or software tools, to ensure that the property values are internally consistent. However, in situations where two values of a given property are different, then the value specified by the single-property CF attribute will take precedence.

For example, if the semi-major axis length of the ellipsoid is defined by the CF grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[...] element), then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

While this potential for information duplication is undesirable, the alternative - defining a multitude of CF grid mapping attributes for all potential CRS properties - is considered to represent an even less attractive solution.

Example of CRS WKT Usage

The simplistic CDL example below illustrates how the crs_wkt attribute could be used to define the CRS for a dataset based upon the British National Grid (which is a flavour of transverse mercator projection). In this example line breaks have been inserted into the WKT value to aid legibility: in real-world netCDF files the value of the crs_wkt attribute may well be a single unbroken line of text.

Additional, possibly more realistic WKT examples are included in the document resources cited in the References section.

dimensions:
  x = 800 ;
  y = 600 ;
  time = 30 ;

variables:
  double x(x) ;
    x:standard_name = "projection_x_coordinate" ;
    x:long_name = "British National Grid eastings" ;
    x:units = "m" ;
  double y(y) ;
    y:standard_name = "projection_y_coordinate" ;
    y:long_name = "British National Grid northings" ;
    y:units = "m" ;
  double time(time) ;
    ...
  double lat(y, x) ;
    ...
  double lon(y, x) ;
    ...

  // a data variable whose CRS definition is provided by the 'bng_crs' grid mapping variable
  float precip(time, y, x) ;
    precip:standard_name = "rainfall_amount" ;
    precip:coordinates = "lat lon" ;
    precip:grid_mapping = "bng_crs" ;
    ...

  // grid mapping variable containing a WKT definition of the British National Grid
  int bng_crs ;
    bng_crs:grid_mapping_name = "transverse_mercator" ;
    bng_crs:crs_wkt = "PROJCS ["OSGB 1936 / British National Grid",
      GEOGCS ["OSGB 1936",
        DATUM ["OSGB 1936", SPHEROID ["Airy 1830", 6377563.396, 299.3249646]],
        PRIMEM ["Greenwich", 0],
        UNIT ["degree", 0.0174532925199433]],
      PROJECTION ["Transverse Mercator"],
      PARAMETER ["False easting", 400000],
      PARAMETER ["False northing", -100000],
      PARAMETER ["Longitude of natural origin", -2.0],
      PARAMETER ["Latitude of natural origin", 49.0],
      PARAMETER ["Scale factor at natural origin", 0.9996012717],
      UNIT ["metre", 1.0]]" ;
    ...

Note: names of projection PARAMETERs in the example above follow the spelling of Coordinate Operation Parameters defined in the EPSG geodetic parameter registry. Different client applications might well use different parameter names: the current proposal does not seek to prescribe the content of the crs_wkt attribute for any particular application context.

4.2 Proposed Changes to the CF Conventions Document (Normative)

This proposal would require the following changes to the CF conventions document.

Addition of a new subsection 5.6.1 after Example 5.10

5.6.1. Use of the CRS Well-known Text Format

An optional grid mapping attribute called crs_wkt may be used to specify multiple coordinate system properties in so-called well-known text format (usually abbreviated to CRS WKT or OGC WKT). The CRS WKT format is widely recognised and used within the geoscience software community. As such it represents a versatile mechanism for encoding information about a variety of coordinate reference system parameters in a highly compact notational form.

The crs_wkt attribute should comprise a text string that conforms to the WKT syntax as specified in reference [OGC_CTS]. If desired the text string may contain embedded newline characters to aid human readability. However, any such characters are purely cosmetic and do not alter the meaning of the attribute value. It is envisaged that the value of the crs_wkt attribute typically will be a single line of text, one intended primarily for machine processing. Other than the requirement to be a valid WKT string, the CF convention does not prescribe the content of the crs_wkt attribute since it will necessarily be context-dependent.

The crs_wkt attribute is intended to act as a supplement to other single-property CF grid mapping attributes (as described in Appendix F); it is not intended to replace those attributes. Data producers (and software tools) could, in theory, omit the single-property grid mapping attributes solely in favour of the compound crs_wkt attribute. However, such practice could result in client software conforming to earlier versions of the CF conventions being unable to read a netCDF file configured in this way. Therefore, until client software evolves to support both methods, it is strongly recommended that the crs_wkt attribute be used as an adjunct to the existing set of grid mapping attributes.

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. However, in those situations where two values of a given property are different, then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[...] element) then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

Likewise, in those cases where the value of a CRS WKT element should be used consistently across the CF-netCDF community (names of projections and projection parameters, for example) then, in the absence of an overriding CF-maintained list, the OGP/EPSG registry of geodetic parameters [OGP/EPSG] is considered to represent the definitive authority as regards CRS property names and values (it is noted that some examples in the published literature do not always adhere to the OGP/EPSG values).

Example 5.11 illustrates how the coordinate system properties specified via the crs grid mapping variable in Example 5.10 might be expressed using a crs_wkt attribute (it also represents a slightly modified version of the WKT example shown in section 7.4 of [OGC_CTS]). For brevity only the grid mapping variable is included in this example; all other elements are as per the earlier example. Names of projection PARAMETERs follow the spellings used in the EPSG geodetic parameter registry. Example 5.11 illustrates how certain WKT elements - all of which are optional - can be used to specify CRS properties not covered by existing CF grid mapping attributes, including:

  • use of the TOWGS84 element to specify horizontal datum transformation parameters (to WGS 1984 datum)
  • use of the VERT_DATUM element to specify vertical datum information
  • use of additional PARAMETER elements (albeit not essential ones in this example) to define the location of the false origin of the projection
  • use of AUTHORITY elements to specify object identifier codes assigned by an external authority, OGP/EPSG in this instance

Example 5.11. British National Grid + Newlyn Datum in CRS WKT format

  ...
  int crs ;
    crs:grid_mapping_name = "transverse_mercator" ;
    crs:crs_wkt = "COMPD_CS ["OSGB 1936 / British National Grid + ODN",
      PROJCS ["OSGB 1936 / British National Grid",
        GEOGCS ["OSGB 1936",
          DATUM ["OSGB 1936",
            SPHEROID ["Airy 1830", 6377563.396, 299.3249646],
            TOWGS84[375, -111, 431, 0, 0, 0, 0]
          ],
          PRIMEM ["Greenwich", 0],
          UNIT ["degree", 0.0174532925199433]
        ],
        PROJECTION ["Transverse Mercator"],
        PARAMETER ["False easting", 400000],
        PARAMETER ["False northing", -100000],
        PARAMETER ["Longitude of natural origin", -2.0],
        PARAMETER ["Latitude of natural origin", 49.0],
        PARAMETER ["Longitude of false origin", -7.556],
        PARAMETER ["Latitude of false origin", 49.766],
        PARAMETER ["Scale factor at natural origin", 0.9996012717],
        UNIT ["metre", 1.0],
        AUTHORITY ["EPSG", "27700"]
      ],
      VERT_CS ["Newlyn",
        VERT_DATUM ["Ordnance Datum Newlyn", 2005],
        UNIT ["metre", 1.0]",
        AXIS ["Gravity-related height", UP],
        AUTHORITY ["EPSG", "5701"]
      ]]" ;
  ...

Note: To enhance readability the WKT value has been split across multiple lines and embedded quotation marks (") left unescaped - in real netCDF files such characters would need to be escaped. The WKT specification in [OGC_CTS] appears to silent be as regards which character(s) may be used to delimit text-valued properties; however, since all the examples in that specification use quotation marks, the use of that particular delimiting character is mandated by the CF convention.

Insertion of an additional row into Table F.1

In Appendix F the following additional line entry should be inserted into Table F.1 at row 1.

crs_wkt S This optional attribute may be used to specify multiple coordinate system properties in well-known text (WKT) format. The syntax must conform to the WKT format as specified in reference [OGC_CTS]. Use of the crs_wkt attribute is described in section 5.6.1.

Additions to Bibliography

[OGC_CTS] OpenGIS Coordinate Transformation Service Implementation Specification. OGC document 01-009. 12 January 2001 (URL: http://www.opengeospatial.org/standards/ct)

Errata

  1. In Example 5.10, the final attribute should more accurately be scale_factor_at_central_meridian rather than scale_factor_at_projection_origin, since the scale factor applies to the entirety of the central meridian employed in the projection, not just the point of origin.
  1. The entry for grid_mapping_name in Table F.1 needs to be amended. The data type column should read 'S' not 'N'.

Changes to Conformance Document

The following conformance statement needs to be appended to the Requirements heading of section 5.6:

"If present, the crs_wkt attribute must be a text string conforming to the CRS WKT specification described in reference [OGC_CTS]."

5. Benefits

5.1. The proposed new crs_wkt attribute will enable data producers and software developers to encode a richer set of CRS properties within netCDF files.

5.2. The crs_wkt attribute is entirely optional; its presence (or absence) will not render existing CF-compliant netCDF files non-compliant. Similarly, the presence of the crs_wkt attribute should not cause existing netCDF clients to fail - it should simply be ignored as with any other unrecognised attribute.

5.3. The CRS WKT format represents a compact notational format, one that has been in widespread use for many years within the geoscience software community. It is currently used by a number of commercial and open source software packages (e.g. GDAL, GeoAPI).

5.4. The CRS WKT format is both human-readable and amenable to machine-processing (its primary intent, of course). As such it complies with the customary requirement for CF metadata properties to be self-describing.

5.5. Use of the crs_wkt attribute obviates the need to devise numerous discrete CF attributes such as would be required to realise the complex conceptual model underpinning the CRS domain.

5.6. The CRS WKT format ![1] is maintained by the Open Geospatial Consortium (OGC), the world's leading independent body for commissioning and publishing geospatial standards (as such the notation is sometimes referred to as the OGC WKT format). The CRS WKT specification appears to be stable.

6. Status Quo

Data producers could - and anecdotal evidence suggests that they do - encode additional CRS properties in locally-devised netCDF attributes. Clearly this is undesirable as, over time, it will lead to a proliferation of parochial and incompatible implementations.

The author is aware of a number of data producers and software developers (both commercial and open source) that are currently using a similar approach to the one described in this proposal. Consequently, a chief aim of this proposal is to formalise this practice as part of the CF conventions.

7. References

![1] OpenGIS Coordinate Transformation Service Implementation Specification. OGC document 01-009 (link)

![2] OpenGIS Implementation Standard for Geographic Information - Simple feature access - Part 1: Common architecture. OGC document 06-103r4 (link)

![3] Well-known Text Wikipedia entry (link)

![4] GeoAPI description of WKT format (link)

Change History (58)

comment:1 follow-ups: Changed 10 years ago by jonblower

Two questions about this:

  1. For horizontal CRSs, how can we define how the individual axes of the WKT definition map to the CF coordinate variables? It's not always obvious: polar projections don't have a clear mapping from x/y to easting/northing. CRS:84 and EPSG:4326 are the same CRS with the axes defined in a different order.
  1. Can the use of WKT apply to vertical CRSs too, and if so should we provide an example?

comment:2 in reply to: ↑ 1 Changed 10 years ago by pbentley

  • Cc pbentley added

Hi Jon,


  1. For horizontal CRSs, how can we define how the individual axes of the WKT definition map to the CF coordinate variables? It's not always obvious: polar projections don't have a clear mapping from x/y to easting/northing. CRS:84 and EPSG:4326 are the same CRS with the axes defined in a different order.

If I understand the question correctly (and I'm not sure that I do), one could achieve this by specifying the WKT AXIS elements in the desired order, e.g.

... AXIS["Lon", EAST], AXIS["Lat", NORTH], ...

for, say, CRS:84. Whereas in the WKT for EPSG:4326 they'd presumably be in the familiar lat-followed-by-lon order. Likewise, if the sense of the X and Y axes were transposed one might use:

... AXIS["Y", EAST], AXIS["X", NORTH], ....

Personally, I hadn't envisaged any kind of mapping between the WKT string and CF coord variables. That's not to say it might not be feasible.


  1. Can the use of WKT apply to vertical CRSs too, and if so should we provide an example?

Indeed. Example 5.11 includes a pared-down vertical CRS for the Newlyn vertical datum at the end of the WKT string. I'm no vertical CRS expert - if someone is able to provide additional and better examples then we can include those. I was endeavouring to keep my examples fairly succinct since they take up a lot of page space.

Phil

comment:3 Changed 10 years ago by jonblower

Hi Phil,

Personally, I hadn't envisaged any kind of mapping between the WKT string and CF coord variables.

I think this might be necessary to allow coordinates to be converted successfully to other CRSs, amongst other uses. One would need to know whether to take coordinate tuples in the order y,x or x,y from the coordinate variables, otherwise the WKT definition is not very helpful. There may need to be a CF attribute that defines the order of the coordinate variables that matches the WKT definition. (e.g. axis_mapping = "y x" or axis_mapping = "lon lat"). I haven't thought through the syntax!

Jon

comment:4 follow-up: Changed 10 years ago by jonathan

Dear Phil

Thanks for taking the time to formulate this. Though I respect the effort invested, I have serious reservations about the proposal, as will probably not surprise you, though I won't be surprised if I am in a minority. I recognise that it is sensible to use other conventions when we can rather than inventing our own in CF, and that the CRS WKT convention is popular, but I'm not convinced it is sensible to import it wholesale. Doing so may solve some practical problems, because existing software could be used to interpret CRS WKT, but I fear that it would create other problems of inconsistency and lack of clarity.

As examples of the potential for inconsistency from your example:

  • The transverse Mercator projection parameters can be specified by the existing CF grid_mapping attributes.
  • Likewise for the spheroid parameters.
  • The CRS WKT axis keyword could conflict with the CF axis and positive attributes (as you and Jon have discussed).

Both CF and CRS WKT specifications might be present and could be inconsistent. To detect such inconsistency would require software that can interpret both conventions, following rules that describe the correspondence between them, but even if that is done, inconsistent files will probably be created.

If we support CRS WKT, will we require grid_mapping too? If so, that increases the work by data-writers and the requirement to include both will probably not be followed. If not, some files will have grid_mapping and some will have CRS WKT, so all CF-aware software will need to understand both.

Regarding lack of clarity, I would agree that the CRS WKT format is concise and human-readable, but some of it is not as self-explanatory as CF should be, in particular the use of "positional parameters", such as SPHEROID ["Airy 1830", 6377563.396, 299.3249646] and TOWGS84[375, -111, 431, 0, 0, 0, 0]. You could not know what these mean unless you look it up or have memorised the format. I don't think it is self-describing metadata. A further kind of lack of clarity and self-description would be the use of EPSG codes, which are opaque. Can they be translated automatically into parameters? This is another source of potential inconsistency as well.

My proposal would therefore be not to introduce a single crs_wkt attribute, but to add many new attributes to Appendix F based on or inspired by CRS WKT one by one, as required by use cases. This is our usual practice, to extend CF as and when needed, but not in advance or en bloc. Doing it step by step will allow us to consider each specific need carefully, to see whether CF already has an equivalent feature, and to ensure we understand the implications. As a benefit of your proposal, you mention that it obviates the need to do just what I am proposing! However, if the CRS WKT concepts are already carefully thought-out and well-defined, it should not be much work to add equivalent attributes to CF. When it turns out not to be trivial, it's probably a sign that there is some difficulty we should pay attention to.

I realise that means software which can interpret CRS WKT won't immediately be able to use CF attributes, and that is a drawback, but I would propose that at the same time we build up a table of correspondences (in Appendix F or a new appendix) between the two systems. That table can then be used to write software to translate CF grid_mapping to CRS WKT on the fly. In my opinion, we need to have this table anyway, if we accept your proposal for crs_wkt, because the CF checker at least would need to be able to do this translation in order to detect inconsistency.

Best wishes

Jonathan

comment:5 in reply to: ↑ 4 ; follow-up: Changed 10 years ago by pbentley

Hi Jonathan,

You raise many points here, as expected. Some responses below in roughly the same order.

  1. Importing of CRS WKT convention.

Nowhere in the proposal is it suggested that the CRS WKT convention be imported "wholesale" into CF. All that is being proposed is the inclusion of an optional attribute whose value is a text string which conforms to the CRS WKT syntax. In the same sentence you state that "it is sensible to use other conventions...rather than inventing our own...", yet not sensible to use this particular one! If the CF community is unwilling to countenance the use of a de facto standard as succinct, well-known and widely-used as CRS WKT then, off the top of my head, I cannot think of a single standard (pertinent to our domain of interest) that might satisfy your acceptance criteria.

  1. Potential inconsistency between CRS WKT and other grid mapping attributes

The proposal is explicit on this issue: if there is inconsistency between the crs_wkt attribute and other attributes, then the latter take precedence. Let's not forget that there are other parts of CF-netCDF that are also susceptible to inconsistent usage (the use of valid_min, valid_max & valid_range springs to mind - the mailing list archive will throw up the relevant posts). I don't think it's reasonable to reject a particular technique on the basis that someone might misuse it.

  1. Use of CRS WKT and grid_mapping

Again, the proposal is explicit in this regard. The crs_wkt attribute is an optional attribute which may be attached to a grid mapping variable. If you don't have the latter, then you can't have the former. If CF compliance demands the presence of certain (existing) grid mapping attributes, then they should be there. But inclusion of the crs_wkt is optional and, evidently, context-dependent.

  1. CRS WKT not self-explanatory.

IMHO this is highly subjective. Personally I find the CRS WKT syntax to be perfectly self-explanatory and self-describing. Anecdotally that also seems to be the majority view of folks on the CF mailing list -- though someone with spare cycles might wish to tot up the yays and the nays to prove me wrong! As for myself, I find the syntax of the cell_methods attribute (especially for climatological statistics) to be more abstruse and less self-explanatory than CRS WKT. I have to look up the meaning every time I need to read or write it, simply because I don't use it very often. Practice makes perfect! At the end of the day, people (and sofware clients) that need to use the WKT format will know how to do so. As the proposal clearly states: the WKT string is human-readable but primarily intended for machine processing.

  1. Use of EPSG codes.

The proposal, by intention, does not specifically recommend the use of OGP/EPSG codes. The CRS WKT syntax allows such things to be specified via optional AUTHORITY elements (as example 5.11 shows), but they cannot be specified alone, you have to define all the other mandatory elements in order for it to be a valid WKT string. If an EPSG code is specified, and a software client happens to know what to do with it, all well and good. If not, then the core CRS properties are there as the primary means of specification. A WKT string composed solely of EPSG codes would be nonsensical.

  1. Addition of many new attributes, incrementally.

Your alternative suggestion. To cover the CRS WKT syntax I reckon this would entail the addition of o(20-30) new CF attributes, including at least 7 just for horizontal datum transformations (for which there does appear to be a tangible requirement). At the risk of being presumptuous, I cannot see anyone having the energy and determination to push through such a collection of new CF attributes when a single attribute does the job to a reasonable degree of efficacy - not perfect, for sure, but pretty fit for a range of use-cases it would seem.

Also, what you are suggesting appears, to these eyes anyway, to involve importing the CRS WKT (meta)data model into the CF metadata scheme, something which you seem opposed to at point 1. The current proposal is merely using/referencing the existing CRS WKT scheme as specified by OGC.

  1. Table of correspondences (CF to CRS WKT mappings)

If it's useful, I have compiled a provisional list of mappings from existing CF grid mapping attributes to CRS WKT elements here, which can be expanded and refined as required. I don't think this table would need to be added to the CF conventions document - it should suffice to reference the wiki page. But I don't have a strong view on this.

Regards,

Phil

comment:6 Changed 10 years ago by caron

I support this ticket. In essence it say, "if you are want to use WKT, here is where to put it", with clear instructions that current CF attributes take precedence if in conflict.

The basic argument is that CF doesnt have the time or expertise to reinvent all the CRS work thats been done. Theres nothing thats being taken away from CF here, just a way to let the non-climate users get on with writing CF files.

I would suggest adding a (non-normative?) appendix on the mapping bewteen WKT and CF params. This will help software developers do the right thing.

Im also open to adding new CF parameters that capture the WKT semantics, as we understand them better.

comment:7 in reply to: ↑ 5 ; follow-up: Changed 10 years ago by jonathan

Dear Phil

  1. Importing of CRS WKT convention.

Nowhere in the proposal is it suggested that the CRS WKT convention be imported "wholesale" into CF. All that is being proposed is the inclusion of an optional attribute whose value is a text string which conforms to the CRS WKT syntax.

That attribute can contain any CRS WKT information. That's the idea. Hence, the whole of the CRS WKT convention is imported into CF. That's what I meant. On the other hand, if we incorporate it bit by bit, we will be able to check for problems that could arise. I am not suggesting that CRS WKT has not been thought-out properly, but because it is an independent convention, it overlaps with CF.

  1. Potential inconsistency between CRS WKT and other grid mapping attributes

The proposal is explicit on this issue: if there is inconsistency between the crs_wkt attribute and other attributes, then the latter take precedence. Let's not forget that there are other parts of CF-netCDF that are also susceptible to inconsistent usage (the use of valid_min, valid_max & valid_range springs to mind - the mailing list archive will throw up the relevant posts). I don't think it's reasonable to reject a particular technique on the basis that someone might misuse it.

I think it is important to minimise the possibility of things going wrong. It is not really sufficient, I feel, to say which takes precedence. In fact it might not be easy to apply this principle anyway, as the attributes may not correspond one-to-one. To some extent we can protect against inconsistency by enabling the CF checker or other tools to check for it. It does that for quite a few possible inconsistencies that may arise among CF attributes, but the introduction of CRS WKT would present a lot more potential problems.

  1. Addition of many new attributes, incrementally.

Your alternative suggestion. To cover the CRS WKT syntax I reckon this would entail the addition of o(20-30) new CF attributes, including at least 7 just for horizontal datum transformations (for which there does appear to be a tangible requirement). At the risk of being presumptuous, I cannot see anyone having the energy and determination to push through such a collection of new CF attributes when a single attribute does the job to a reasonable degree of efficacy - not perfect, for sure, but pretty fit for a range of use-cases it would seem.

If we need 7 new attributes to describe horizontal coordinate systems, I vote for doing that. I would usually vote for doing what we need now, to address use cases we have, and not doing what we don't need yet. If the description of CRS WKT provides clear definitions for these 7 attributes, then adding them to Appendix F should be easy. That's provided they don't conflict with Appendix F. If they do, then we need to know about it and we should resolve it. That means we have to understand what these attributes mean. I guess you and I would already have spent less time on this discussion if the proposal had been to add 7 specific attributes to grid_mapping. :-)

Also, what you are suggesting appears, to these eyes anyway, to involve importing the CRS WKT (meta)data model into the CF metadata scheme, something which you seem opposed to at point 1.

Yes, it is closely related. You could say that I am agreeing with you more than you suspect. That is, I would really like to use the expertise that has been invested in the development of CRS WKT. I would just rather we passed it through a "filter", by deciding explicitly which aspects of it to incorporate in CF, than provided a mechanism which immediately allows all aspects of it to be used.

If it's useful, I have compiled a provisional list of mappings from existing CF grid mapping attributes to CRS WKT elements here, which can be expanded and refined as required. I don't think this table would need to be added to the CF conventions document - it should suffice to reference the wiki page. But I don't have a strong view on this.

This is really useful. Thanks! If crs_wkt is added to CF, I certainly think that this kind of information will be needed in the conformance document, as input to the CF checker.

On the other hand, if we extended CF to add more attributes, motivated by CRS WKT, and software existed which could translate CF attributes into CRS WKT strings, would that meet the need?

Best wishes

Jonathan

comment:8 in reply to: ↑ 7 ; follow-up: Changed 9 years ago by markh

if we incorporate it bit by bit, we will be able to check for problems that could arise.

This represents a significant amount of ongoing effort, putting the onus on a data creator to navigate the process of approval for a coordinate reference system unused in CF, but well known elsewhere.

I think this is an unreasonable expectation, which will lead to data creators taking their own approaches, encoding useful metadata in whatever way they see fit.

I think it is important to minimise the possibility of things going wrong. It is not really sufficient, I feel, to say which takes precedence.

There is precedence for the precedence approach, with data variable attributes taking precedence over global attributes of the same name.

To minimise things going wrong we could consider advice which recommends only named attributes OR WKT are used, but not both. In the case of both, precedence applies.

I do not see why I would require the CF named attributes for grid_mapping variables (coordinate reference system definitions) if I could use the WKT attribute, which appears to meet my requirements by giving a necessarily rich vocabulary for this complex subject area.

Using both in a file is counter-intuitive to me. Either a data provider want to use WKT, or they don't and use the grid_mapping attributes they have used in the past.

comment:9 follow-up: Changed 9 years ago by jonblower

Hi,

I still don't understand Phil's comment about axis mapping:

Personally, I hadn't envisaged any kind of mapping between the WKT string and CF coord variables. That's not to say it might not be feasible.

Without the mapping, how does a piece of software know how to interpret a coordinate pair [x,y] to find the correct data point in the variable? (Or vice-versa?) And if this isn't possible, what is the purpose of specifying the CRS in WKT?

Remember that the CF axis order and the WKT axis order are different things. The CF axis order records the internal storage order. The WKT axis order defines how an external application constructs coordinate tuples.

Jon

comment:10 in reply to: ↑ 9 Changed 9 years ago by markh

Replying to jonblower:

Hi,

I still don't understand Phil's comment about axis mapping:

Personally, I hadn't envisaged any kind of mapping between the WKT string and CF coord variables. That's not to say it might not be feasible.

Without the mapping, how does a piece of software know how to interpret a coordinate pair [x,y] to find the correct data point in the variable? (Or vice-versa?) And if this isn't possible, what is the purpose of specifying the CRS in WKT?

Remember that the CF axis order and the WKT axis order are different things. The CF axis order records the internal storage order. The WKT axis order defines how an external application constructs coordinate tuples.

Jon

Hello Jon

Section 7.3 Description of WKT keywords describes the AXIS definition (7.3.2) within a coordinate reference system definition. There are defaults, or the creator can explicitly define axis names and order.

7.3.2 AXIS
The name of the axis is for human consumption. The enumerated value that follows is to allow 
software to correctly overlay different coordinate systems.
If the optional AXIS terms are not present, then the default values are assumed. They are
Geographic Coordinate Systems: AXIS[“Lon”,EAST],AXIS[“Lat”,NORTH]
Projected Coordinate System: AXIS[“X”,EAST],AXIS[“Y”,NORTH]
Geocentric Coordinate System: AXIS[“X”,OTHER],AXIS[“Y”,EAST],AXIS[“Z”,NORTH]
However, if these terms are present, and have non-default values, then implementations must be
prepared to swap and reverse the coordinates of geometry before attempting to overlay graphics.

I would suggest that it is the responsibility of the data creator using a WKT coordinate reference system to ensure that the AXIS definition is consistent with the axis naming and data dimension ordering in the NetCDF file.

The WKT specification highlights this for geographic coordinate systems:

7.3.8 GEOGCS
A coordinate system based on latitude and longitude. Some geographic coordinate systems are 
Lat/Lon, and some are Lon/Lat. You can find out which this is by examining the axes. You should 
also check the angular units, since not all geographic coordinate systems use degrees.

comment:11 Changed 9 years ago by jonblower

I would suggest that it is the responsibility of the data creator using a WKT coordinate reference system to ensure that the AXIS definition is consistent with the axis naming and data dimension ordering in the NetCDF file.

That seems sensible to me. Maybe some examples could be generated of real NetCDF files that use this approach, for lon-lat, lat-lon and projected reference systems?

comment:12 in reply to: ↑ 8 Changed 9 years ago by pbentley

Replying to markh:

Hi Mark,

To minimise things going wrong we could consider advice which recommends only named attributes OR WKT are used, but not both. In the case of both, precedence applies.

I do not see why I would require the CF named attributes for grid_mapping variables (coordinate reference system definitions) if I could use the WKT attribute, which appears to meet my requirements by giving a necessarily rich vocabulary for this complex subject area.

Using both in a file is counter-intuitive to me. Either a data provider want to use WKT, or they don't and use the grid_mapping attributes they have used in the past.

To reiterate our offline conversation a couple of days ago, for now the proposed new crs_wkt attribute may be used to supplement the existing grid mapping attributes, not act as an alternative. As the proposal mentions in para 3 of section 4.1, this is to avoid the situation where existing CF-compliant software apps may be rendered unable to correctly interpret the coordinate system used in a netCDF file owing to the absence of expected grid mapping attributes.

At some point in the future software may have evolved to work with both CRS specification mechanisms -- or, who knows, possibly even better alternatives! Until then the crs_wkt attribute should remain an optional adjunct IMHO.

Cheers, Phil

comment:13 Changed 9 years ago by markh

  • Summary:
    • A large body of work has taken place to define CRSs in WKT, which CF could benefit from by adopting its use.
      • This would enable the use of existing software to provide CS transformations using WKT.
    • Axes and axis ordering is data centric; it is the responsibility of the data creator to define the relevant WKT ordered to match the data structure.
    • WKT is proposed as an additive adjunct to the grid_mapping definition, not a replacement.
  • Supporters:
    • pbentley, caron, markh
  • Concerned parties:
    • jonathan
  • Contention centres on:
    • the benefit to CF of adopting the WKT approach to CRS definition, as an optional adjunct
      • thus to support the proposal
    • the price paid by CF in incorporating WKT definitions, compared to a step-wise incorporation of CF defined attributes to meet need
      • thus to resist the proposal
  • Status:
    • consensus not yet achieved

comment:14 Changed 9 years ago by pbentley

Hi Mark,

Thanks for summarising the situation to date as regards this proposal. To those observations I would offer the following comments.

  • It doesn't seem as though there is any fundamental dispute as to the fact that the WKT notation represents a sufficient, compact and standards-based mechanism for capturing desirable coordinate system properties.
  • The WKT approach isn't being offered up as a silver bullet solution to all the issues of CRS definition; rather, in the best tradition of the 80-20 rule, it's been put forward as a reasonably clean and simple way of optionally attaching useful CRS metadata to netCDF files.
  • As you allude to, the main concern here would appear to be less about the details of the WKT encoding mechanism itself, and more about the differing views as to how the CF conventions should evolve. Some commentators prefer an incremental 'just-in-time' approach, adding new attributes on an as-needs basis. Others are willing to accept a 'look-ahead' approach, anticipating the need for new CF capabilities which are likely to be required next week, next month, or which might be 'just over the horizon'.
  • Both approaches have their potential risks and rewards. And I'm certainly not presuming to say that one is better than the other. However, because they are necessarily subjective, it may be the case that we will always find it difficult to achieve consensus, let alone unanimity, in such matters, regardless as to the technical merit (or not) of any particular proposal.

Phil

comment:15 follow-up: Changed 9 years ago by jonathan

Dear Mark and Phil

Thanks for your good summary, Mark, and for your well-expressed response, Phil. I remain a concerned party, however, so I'll try explain my motivations and alternative proposal in more detail.

Having read WKT description that Phil pointed out, I agree that the WKT is a well-thought-out standard and offers a compact notation for describing coordinate systems. I support the idea to make use of it in CF. My concern is how this should be done, as you correctly say. WKT does not neatly slot into a "hole" in CF, in my opinion, because (a) it overlaps with existing attributes and hence its use could lead to redundant, inconsistent or incorrect metadata, (b) it has a different kind of syntax, which could lead to an inconsistent approach for how to record aspects of coordinate reference systems and require data-readers to be able to process more than one convention for some aspects, (c) it has features for which needs have not identified and whose place in CF have not been considered, such as "math transform WKT". None of these things is a problem with WKT in itself, but in my opinion they mean that including WKT through a single attribute is not the right way to do it. These kinds of concern are consistent with some general principles that have been followed in extending CF. However, I'm not insisting on them for dogmatic reasons, but because I think they are sensible.

My alternative proposal, as already stated, is that we should define some more grid_mapping attributes derived from WKT entities. To accommodate Phil's example of the British National Grid as a horizontal coordinate system, I think we need the following new attributes in Table F.1:

projected_coordinate_system_nameSName of the projected coordinate system (a combination of a geographic (longitude-latitude) coordinate system and a projection). The name is not standardised.
geographic_coordinate_system_nameSName of the geographic (longitude-latitude) coordinate system. The name is not standardised.
horizontal_datum_nameSName of the horizontal reference datum for the geographic (longitude-latitude) coordinate system. The name is not standardised.
reference_ellipsoid_nameSName of the reference ellipsoid (i.e. spheroid) used in the horizontal reference datum for the geographic (longitude-latitude) coordinate system. The name is not standardised.
prime_meridian_nameSName of f the prime meridian of the geographic (longitude-latitude) coordinate system. The name is not standardised.
prime_meridianNLongitude of the prime meridian of the geographic (longitude-latitude) coordinate system, in units of degrees_east.

All except the last are names. Is it correct that they are not standardised or, if they are standardised, can you point me to a document which lists and defines the possible values? I used the phrase "reference ellipsoid" because we do so in some standard names, and the WKT description implies "ellipsoid" and "spheroid" are synonymous.

For the CF use, the prime meridian must be in degrees east, since this is the only unit which CF allows for longitude. Likewise the unit for the geographic coordinate system need not be specified, because it must be degrees (east or north). If there were a need to support geographic coordinate systems which have longitude and latitude in units other than degrees, as suggested by WKT, we would need new standard names for these coordinates. Similarly the unit for the projection coordinates does not have to be specified, because it is given by the projection coordinate variables. In fact those variables, identified by having standard names of projection_x_coordinate and projection_y_coordinate, do not need to have the same units, but WKT appears not to support that possibility.

The WKT AXIS keyword has been discussed previously on this ticket. Whether the coordinate system is lon-lat or lat-lon is dictated by the order of the coordinate variables, as Mark says. However, the axes of the CF data variable are actually the projection coordinates, not latitude and longitude, so I am unclear how the AXIS entities should be specified.

I assume that one reason why WKT is attractive is that other software can interpret it. This need could be met by generating it when required from CF attributes, including UNIT and AXIS parameters that correctly describe the CF data variable. Thus inconsistencies can be avoided. As an example, please find at the end of this posting the CDL which defines a test file, and an IDL program which will read this file to generate WKT. It produces the following:

PROJCS ["OSGB 1936 / British National Grid", GEOGCS ["OSGB 1936", DATUM ["OSGB
1936", SPHEROID ["Airy 1830", 6377563.4, 299.32496]], PRIMEM ["Greenwich",
0.0000000], UNIT ["degree",0.0174533]], PROJECTION ["Transverse Mercator"],
PARAMETER ["Scale factor at natural origin", 0.99960127], PARAMETER
["Longitude of natural origin", -2.0000000], PARAMETER ["Latitude of natural
origin", 49.000000], PARAMETER ["False easting", 400], PARAMETER ["False
northing", -100], UNIT ["metre",1000.00]]

I don't think it would surprise anyone that this can be done and I expect most people would prefer other languages. It's just a demonstration that the code is simple. It is also an example that we might make further progress in the usability of CF by providing more code intended to use CF. The Python code to read and carry out simple manipulations on CF objects under trac ticket 68 is another example.

Regarding the WKT for the vertical datum, the Newlyn example, I have some questions. There is a vertical datum code: can you point me to a list which defines these codes? We need to be sure that the code is consistent with the CF vertical axis. What would it mean, for instance, if you said you were using the Newlyn vertical datum but your vertical coordinate was air pressure? What is "gravity-related height"? Is it altitude, geopotential height, or something else? Are these vertical coordinate names standardised somewhere?

It seems to me that questions such as the above need to be sorted out in order for it to be possible to generate WKT correctly and reliably from CF. If we did have a sufficient algorithm for generating the WKT of interest from the CF attributes, it would make it much safer to store that WKT in the file as well, as Phil proposes, because the CF_checker could itself generate the WKT, and give an error if it did not match the contents of the crs_wkt attribute. But if it is easy to generate WKT when required on the fly, that seems better to me, as it eliminates the possibility of inconsistency.

My alternative proposal does imply the need to amend the CF convention each time a new WKT concept is required to be included. But if the concept is well-defined and does not conflict with other attributes, that should be unproblematic. If it does conflict, on the other hand, surely we need to discuss and resolve the problem.

Best wishes

Jonathan

Example file

netcdf crs {
dimensions:
  lon=36;
  lat=18;
  x=10;
  y=5;
variables:
  float quantity(y,x);
    quantity:grid_mapping="bng";
    quantity:coordinates="lon lat";
  float lat(y,x);
    lat:standard_name="latitude";
    lat:units="degrees_north";
  float lon(y,x);
    lon:standard_name="longitude";
    lon:units="degrees_east";
  float x(x);
    x:standard_name="projection_x_coordinate";
    x:units="km";
  float y(y);
    y:standard_name="projection_y_coordinate";
    y:units="km";
  int bng;
    bng:projected_coordinate_system_name="OSGB 1936 / British National Grid";
    bng:geographic_coordinate_system_name="OSGB 1936";
    bng:horizontal_datum_name="OSGB 1936";
    bng:reference_ellipsoid_name="Airy 1830";
    bng:inverse_flattening=299.3249646;
    bng:prime_meridian_name="Greenwich";
    bng:prime_meridian=0.0;
    bng:grid_mapping_name="transverse_mercator";
    bng:semi_major_axis=6377563.396;
    bng:scale_factor_at_projection_origin=0.9996012717;
    bng:longitude_of_projection_origin=-2.0;
    bng:latitude_of_projection_origin=49.0;
    bng:false_easting=400;
    bng:false_northing=-100;
  :Conventions = "CF-1.5";
data:
  x=1,2,3,4,5,6,7,8,9,10;
  y=1,2,3,4,5;
}

Example program:

function ncdf_strattget,cdfid,varid,attname
; Return an attribute as a string with no trailing or leading blanks
ncdf_attget,cdfid,varid,attname,attvalue
return,strtrim(string(attvalue),2)
end

function crs,file
; Translate CF attributes into a WKT string describing a projected coordinate
; system

cdfid=ncdf_open(file)
varid=ncdf_varid(cdfid,'quantity') ; the data variable

; Loop over the coordinate variables, looking for projection coordinates,
; and record their units
varinfo=ncdf_varinq(cdfid,varid)
for idim=0,varinfo.ndims-1 do begin
  ncdf_diminq,cdfid,varinfo.dim(idim),name,size
  coord_varid=ncdf_varid(cdfid,name)
  standard_name=ncdf_strattget(cdfid,coord_varid,'standard_name')
  units=ncdf_strattget(cdfid,coord_varid,'units')
  case standard_name of
    'projection_x_coordinate': xunits=units
    'projection_y_coordinate': yunits=units
    else:
  endcase
endfor
; Find the conversion factor from projection coordinate units to metres
if n_elements(xunits) eq 0 or n_elements(yunits) eq 0 $
  then message,'Data variable must have both x and y projection coordinates'
if xunits ne yunits $
  then message,'Projection coordinates must be in the same units'
case xunits of
  'm': factor=1.0
  'km': factor=1e3
  else: message,'Unrecognised unit'
endcase

; Find the grid_mapping variable
ncdf_attget,cdfid,varid,'grid_mapping',grid_mapping
grid_mapping=ncdf_strattget(cdfid,varid,'grid_mapping')
grid_mapping_varid=ncdf_varid(cdfid,grid_mapping)

; Loop over and record the attributes of the grid_mapping variable
varinfo=ncdf_varinq(cdfid,grid_mapping_varid)
params='' ; used to accumulate keyword-value pairs for PARAMETER strings
for attnum=0,varinfo.natts-1 do begin
  attname=ncdf_attname(cdfid,grid_mapping_varid,attnum)
  value=ncdf_strattget(cdfid,grid_mapping_varid,attname)
  case attname of
    'projected_coordinate_system_name': projcs=value
    'geographic_coordinate_system_name': geogcs=value
    'horizontal_datum_name': datum=value
    'reference_ellipsoid_name': spheroid=value
    'inverse_flattening': inverse_flattening=value
    'prime_meridian_name': primem_name=value
    'prime_meridian': primem_value=value
    'semi_major_axis': semi_major_axis=value
    'scale_factor_at_projection_origin': $
      params=[params,"Scale factor at natural origin",value]
    'longitude_of_projection_origin':  $
      params=[params,"Longitude of natural origin",value]
    'latitude_of_projection_origin':  $
      params=[params,"Latitude of natural origin",value]
    'false_easting': params=[params,"False easting",value]
    'false_northing': params=[params,"False northing",value]
    'grid_mapping_name': case value of
      'transverse_mercator': projection="Transverse Mercator"
      else: message,'Unknown grid_mapping_name'
    endcase
    else: message,'Unknown grid_mapping attribute: '+attname
  endcase
endfor

ncdf_close,cdfid

; Build the WKT string from the recorded information. Unspecified names
; default to empty strings.
if not keyword_set(spheroid) then spheroid=''
spheroid='SPHEROID ['+$
  strjoin(['"'+spheroid+'"',semi_major_axis,inverse_flattening],', ')+']'
if not keyword_set(datum) then datum=''
datum='DATUM ['+strjoin(['"'+datum+'"',spheroid],', ')+']'
if not keyword_set(primem_name) then primem_name=''
primem='PRIMEM ['+strjoin(['"'+primem_name+'"',primem_value],', ')+']'
unit='UNIT ["degree",'+strtrim(string(!pi/180),2)+']' ; lat and lon are in deg
if not keyword_set(geogcs) then geogcs=''
geogcs='GEOGCS ['+strjoin(['"'+geogcs+'"',datum,primem,unit],', ')+']'
params=params(1:*) ; Discard the first entry
; Convert all the keyword-value pairs into PARAMETER strings
nparameters=n_elements(params)/2
parameters=strarr(nparameters)
for iparameter=0,nparameters-1 $
  do parameters(iparameter)='"'+params(iparameter*2)+'", '+$
  params(iparameter*2+1)
parameters=strjoin('PARAMETER ['+parameters+']',', ')
unit='UNIT ["metre",'+strtrim(string(factor),2)+']'
if not keyword_set(projcs) then projcs=''
projection='PROJECTION ["'+projection+'"]'
projcs='PROJCS ['+$
  strjoin(['"'+projcs+'"',geogcs,projection,parameters,unit],', ')+']'

return,projcs
end

comment:16 Changed 9 years ago by pbentley

Hi Jonathan,

I fear I'd risk sounding like a stuck record if I trotted out the same arguments as I've put forward earlier. So I'll avoid doing that.

The current proposal, as the title indicates, is concerned with specifying CRS properties using the WKT syntax. If the CF community votes not to accept this proposal I won't have a problem with that -- power to the people! (Though I believe it's 3-to-1 in favour at present.)

IMHO, your alternative proposal is sufficiently different, and of sufficient length and detail, as to warrant a separate ticket. But that's just my personal opinion. Others may well see it differently.

Regards

Phil

comment:17 Changed 9 years ago by jonathan

Dear Phil

So far, we have always decided on changes to the CF convention by consensus. Nothing has ever been voted on, as far as I recall. It's much better if we can reach an agreement somehow. You're right, of course, that my proposal is evidently different, but I think it has the same aim, doesn't it?

It might help to be more specific about the needs. Apart from the WKT entities in the OSGB example, which other ones, not covered by the CF convention as it stands, are required by existing use cases? Also, do you know the answers, or where to find them, to my questions about what the definition of the vertical datum means? It is specifically because of not knowing what the OGC convention might mean, and hence not being able to say whether it would be consistent with CF metadata if it were imported, that concerns me. Hence the relevance of the questions.

Best wishes

Jonathan

comment:18 in reply to: ↑ 15 Changed 9 years ago by caron

Replying to jonathan:

Having read WKT description that Phil pointed out, I agree that the WKT is a well-thought-out standard and offers a compact notation for describing coordinate systems. I support the idea to make use of it in CF. My concern is how this should be done, as you correctly say. WKT does not neatly slot into a "hole" in CF, in my opinion, because (a) it overlaps with existing attributes and hence its use could lead to redundant, inconsistent or incorrect metadata, (b) it has a different kind of syntax, which could lead to an inconsistent approach for how to record aspects of coordinate reference systems and require data-readers to be able to process more than one convention for some aspects, (c) it has features for which needs have not identified and whose place in CF have not been considered, such as "math transform WKT". None of these things is a problem with WKT in itself, but in my opinion they mean that including WKT through a single attribute is not the right way to do it. These kinds of concern are consistent with some general principles that have been followed in extending CF. However, I'm not insisting on them for dogmatic reasons, but because I think they are sensible.

Hi Jonathan:

I appreciate your willingness to dive into these details and generate a detailed proposal on how to add the needed semantics into CF. My problem is that I dont have the time right now to dive with you. I also know how long it takes to do these things correctly - eg 3 years or more for the Discrete Sampling Proposal.

So, Im willing to contribute on figuring out the semantics and the attributes needed. But in the meanwhile, lets give people a place to put the WKT string. The practical advantage is that these semantics can be added today, leveraging another group's work.

To answer your concerns from above:

  1. overlap: CF semantics shall prevail where in conflict
  2. burden on data-readers: the ones that know how to process WKT, will do so, and if not, will ignore.
  3. use cases: here we have to trust the work of others. when we do the CF attribute version, we can ignore any obscure cases if we think they are not worth the complexity.

The main effect is that we will need to add CF attributes in a way that is compatible with WKT. That will probably make our job easier.

Philisophically, the question is whether to leverage others' work into CF. The consequence of not doing so is to have many pressing issues that we dont have time to resolve. There are many groups going forward without us due to lack of time and difficulty of getting proposals completed.

So I propose we accept this proposal and make a new discussion starting with Jonathan's ideas, and generate a new proposal on encoding WKT semantics into CF attributes.

(I think everyone who has an opinion needs to weigh in here. Dont leave it for Jonathan to do all the work!)

Regards, John

comment:19 Changed 9 years ago by bnl

I too propose we accept this proposal. Having tried to read the thread with an objective eye, I am left with a relatively simple conclusion:

  • Either we take on board the fact other folks have done good work, and find ways of using that, or,
  • We redo it all ... at a faster pace than is credible (given our history).

This in the context of admitting there is no such thing as a self-describing format; even CF is only self-describing if you have read the convention ...

WKT provides obvious immediate benefits for a significant class of file writers and users, at a small risk of ambiguity, which we can handle both via a "priority" statement, and potentially in the checker as well.

That said, the CF community would undoubtedly benefit from Jonathan's proposal as well. So I totally agree with John's final position!

comment:20 Changed 9 years ago by lowry

Just for the record I supported adopting WKT when it was discussed on the list before the Trac ticket was set up. My view hasn't changed.

comment:21 Changed 9 years ago by jonblower

I also support this proposal, agreeing with Bryan and John above and noting Jonathan's carefully thought-out concerns.

Is there a need to clarify the impact on the CF checker? The checker cannot possibly say whether most of the values of the WKT attributes are correct, but it could check that the AXIS names match the names of the coordinate variables. Or it could simply throw up an advisory saying that it has not checked the WKT.

This proposal may set a precedent. For example, people are already inserting UncertML strings into NetCDF files and we could go through the same conversation again and argue whether UncertML should be CF-ized or whether it should just be insertable "as is" in a standard place. I think the approach of "create a standard place but also look at CF-izing it" (see John Caron's post) is a fair approach in general, although would bear some debate on a case-by-case basis.

comment:22 Changed 9 years ago by jonathan

Dear all

I respect the opinions of all those who have expressed them, but I remain very concerned about the proposal. In fact, my concern is increased because so far no-one has answered the questions I raised about specific possible conflicts. I have also appealed on the email list for people to do that, especially those who understand the use of the OGC conventions better than I do. No-one has yet presented more use-cases that cannot be met by the small number of extra attributes I listed, almost all of which are names, not parameters. At the moment, it looks to me like this discussion amounts to: "We trust those who defined the WKT convention to have thought it out carefully, and apparently some software needs it. We don't actually know what it means, when it might be necessary or whether it conflicts with CF metadata, but we'll put it in the file anyway and hope for the best." Obviously this is a provocative way of stating the case, and my intention is not to be rude or to annoy people, but I am seriously asking, is there some truth in that? I would like to be persuaded to change my mind on the grounds of evidence.

I do not think it is sufficient to assert that conflicts can be resolved by giving CF metadata precedence. No-one could do that unless they are in a position to check whether the two kinds of data are inconsistent, which requires answering questions like those I posed; that is related Jon's remark about what the CF checker could do. In fact, I expect that software which can interpret the WKT would look only at that, and software which can't will look only at other CF grid_mapping attributes, and thus they will reach different interpretations of the file if there is an inconsistency. Is that good enough?

CF so far is a metadata convention, rather than a container for other conventions. Perhaps the latter route is preferable in this area of the description of coordinate reference systems. If that is so, a better approach would be to permit a data variable to have either a grid_mapping variable or WKT, but not both. That would be an easy requirement to check, and it eliminates some kinds of possible inconsistency (though not all of those we have identified, such as about the AXIS, and I'm sure our analysis of this is not complete). If we decided to do that, we could still also proceed with adding further attributes to grid_mapping, and providing software to generate WKT from grid_mapping on the fly (instead of storing it in the file) to meet the needs of those applications which can use it.

Cheers

Jonathan

comment:23 Changed 9 years ago by bnl

OK, I think Jonathan has a very fair point which I didn't properly pick up. Unless one can algorithmically generate (some) WKT and parse WKT to (some) CF, one couldn't pick up potential conflicts, so my appeal to using the CF checker is in fact impossible until one can resolve that. Resolving it would effectively mean porting WKT codelists etc into the checker, which means the CF checker would have to become WKT "aware".

If that's the case, is this proposal simply saying: here is a "CF" community way to say "here be dragons" of the WKT sort ... they're off our map, but here is a community understood method of going off piste into "WKT-land"? If you find that "WKT-land" isn't the same as "CF-land", so be it ... similarly, if you come from "WKT-land", here is a signpost that might help you understand "CF-land". Oh, and by the way, we can't guarantee semantic exactitude between our two languages. So be it.

Now, I understand Jonathan's pure response to this ... but what's the practical response going to be? Of necessity folks will put WKT into CF files, and they'll do so in all sorts of different ways. Not accepting this proposal could cause more confusion and software angst than accepting it.

We can then work on the semantic exactitude, and addressing the incorporation of WKT concepts into CF in a step-wise method - if the community requires it!

comment:24 follow-ups: Changed 9 years ago by caron

I dont know the answer to Jonathan's question yet, but with enough time we can answer them. Eventually we can write transformations between WKT and CF attributes. There will be some glitches, probably minor, but who knows? The nub of the issue is that it will take us a long time to do this. At least a year, Im sure. And Im sure there will only be a few who will spend much time on it.

There is a philosophical question about whether CF should incorporate such outside standards as WKT. I think Jonathan's preference is to not do so, but to learn from them and add the semantics ourselves. So I think its not just a question of figuring out the questions that he poses, but its a matter of examining CF principles.

My observation is that CF does not have enough active participants to be able to add big new semantics in a timely manner. The cost is a slow moving target that is hard for outsiders to start to use. The ability to use WKT would probably reduce barriers to the GIS community, but Im just guessing. So I would propose adding the following principle to CF:

CF may incorporate an outside convention into it when the following conditions hold:

  1. The semantics of the convention are important to the CF community.
  2. The convention is already in wide use by other communities, and the adoption by CF significantly helps other communities adopt CF.
  3. The convention is not in conflict with existing CF standards.

So I would propose that we decide if we agree on this (or some better version of it), and if so, does it apply to this proposal?

comment:25 in reply to: ↑ 1 Changed 9 years ago by caron

Replying to jonblower:

Two questions about this:

  1. For horizontal CRSs, how can we define how the individual axes of the WKT definition map to the CF coordinate variables? It's not always obvious: polar projections don't have a clear mapping from x/y to easting/northing. CRS:84 and EPSG:4326 are the same CRS with the axes defined in a different order.

Hi Jon:

I may not understand this, but I think this may not be a problem. In short, the CRS defines a map f:(x,y) -> lat,lon. The x and y values are defined in the CF file by the standard attributes projection_x_coordinate and projection_y_coordinate attached to the coordinate axis. Thats all thats needed. There is no ordering needed, AFAIU.

Also, just to add, there is no meaning to the order of coordinate axes in the "coordinates" attribute, eg :coordinates = "x y z time"; Im not sure if you implied that there is, but I have seen some confusion on this.

comment:26 in reply to: ↑ 24 Changed 9 years ago by pbentley

Replying to caron:

My observation is that CF does not have enough active participants to be able to add big new semantics in a timely manner. The cost is a slow moving target that is hard for outsiders to start to use.

Absolutely. This factor (inertia in the system) was clearly a key motivator for the proposal to exploit the existing CRS WKT standard. It's seen as a pragmatic solution to that impediment.

CF may incorporate an outside convention into it when the following conditions hold:

  1. The semantics of the convention are important to the CF community.
  2. The convention is already in wide use by other communities, and the adoption by CF significantly helps other communities adopt CF.
  3. The convention is not in conflict with existing CF standards.

So I would propose that we decide if we agree on this (or some better version of it), and if so, does it apply to this proposal?

Having proposed the use of WKT I'd certainly sign up to these principles. However, I wonder if it would be preferable to move the discussion of this wider issue to the mailing list where it will be visible to all. Otherwise we might end up with two separate (albeit related) discussion threads within this ticket.

Replying to jonblower:

Is there a need to clarify the impact on the CF checker? The checker cannot possibly say whether most of the values of the WKT attributes are correct, but it could check that the AXIS names match the names of the coordinate variables

I'd agree with this suggestion. As it stands, under the conformance section the proposal states that the crs_wkt text string must conform to the CRS WKT standard. I think we could extend that with a sentence along the lines of: "An advisory message should be emitted if the name assigned to a WKT AXIS element does not match the name of any coordinate variable or auxiliary coordinate variable stored in the netCDF file."

Regards

Phil

comment:27 in reply to: ↑ 24 ; follow-up: Changed 9 years ago by jonathan

Dear all

Regarding John's proposal:

CF may incorporate an outside convention into it when the following conditions hold:

  1. The semantics of the convention are important to the CF community.
  2. The convention is already in wide use by other communities, and the adoption by CF significantly helps other communities adopt CF.
  3. The convention is not in conflict with existing CF standards.

I agree with these principles too. For example, udunits is a convention which CF uses. CF could adopt conventions for discovery metadata which would extend the fairly small capability it currently has in that area. However, the third principle is not satisfied in the case of this WKT proposal, in my opinion.

CF and WKT obviously conflict in the case of grid_mapping attributes which are already defined e.g. semi_major_axis, inverse_flattening. Such conflicts can't be detected or resolved without software which can parse WKT and knows how to translate between WKT and grid_mapping attributes. It is very likely that they would lead to ambiguous i.e. internally inconsistent files. Allowing both grid_mapping and WKT in these cases is a recipe for inconsistency, but such conflicts could be avoided if we allowed either grid_mapping or CRS WKT for a given data variable, but not both at once.

However, this would not avoid other more subtle conflicts between CF and WKT that we aren't certain about. As I wrote before, in CF the prime meridian and the coordinates of the geographical coordinate system must be in degrees (if specified as latitude and longitude), but WKT allows for them to be in other units. The units of the projection coordinates in CF are attributes of those coordinate variables, and could conflict with what was encoded in WKT. I think there may be conflicts in the specification of vertical coordinate systems, but I don't understand enough of what the WKT specification means to be sure.

John writes

My observation is that CF does not have enough active participants to be able to add big new semantics in a timely manner. The cost is a slow moving target that is hard for outsiders to start to use.

I agree, that is certainly a problem for CF. However, it is not always so. We are having quite a lively discussion on this ticket! As I said above, it seems to me that the use-case identified on this ticket (the British National Grid) could be met by the definition of five new grid_mapping attributes, which have obvious definitions. We could agree to make such a small change straight away to the CF convention, and it would raise no conflicts. If there are other use-cases that we need to address, please could we know what they are? Use-cases are needed because of the possible inconsistencies.

Jonathan

comment:28 in reply to: ↑ 27 ; follow-up: Changed 9 years ago by pbentley

Replying to jonathan:

As I said above, it seems to me that the use-case identified on this ticket (the British National Grid) could be met by the definition of five new grid_mapping attributes, which have obvious definitions. We could agree to make such a small change straight away to the CF convention, and it would raise no conflicts. If there are other use-cases that we need to address, please could we know what they are? Use-cases are needed because of the possible inconsistencies.

For the record, the WKT definition for the British National Grid is included merely as an example of the WKT method and syntax. It is not intended to represent a specific use-case that is driving this proposal, though I expect there are UK-based data producers who may well create netCDF datasets that utilise this particular coordinate system. The WKT method clearly can be employed to satisfy a large number of CRS use-cases: that, after all, is one of it's chief merits.

The paragraph that introduces example 5.11 is, I'd argue, unambiguous in this respect, i.e. "Example 5.11 illustrates how the coordinate system properties ... might be expressed using a crs_wkt attribute." It's an illustration. But if folks think the intent of the example is ambiguous then it would be trivial to clarify it with a sentence along the lines of "Note that this is included only as a partial example of the use of the CRS WKT method" (which sort of repeats the existing sentence, I reckon).

Regards,

Phil

comment:29 in reply to: ↑ 28 Changed 9 years ago by jonathan

Dear Phil

OK, sorry to misuse your use-case! You could say, "Anything which can be represented by WKT" is a use-case, but I think that's too broad. There isn't evidence that everything which could be represented by WKT is needed by users of CF-netCDF. Can anyone be more specific about what additions to CF are needed by users (or would-be users) of CF, beyond the various names and prime meridian needed in the example of the British national grid? I realise, and I'm sure you do too, that I've asked this question before, but it is normal for specific use-cases to be required in order to motivate a change to the convention.

Cheers

Jonathan

comment:30 Changed 9 years ago by bnl

Nearly all the interesting interdisciplinary work we do with hydrology community depends on a plethora of GIS coordinate systems. If we want to work with them, we simply have to support WKT - directly (hard), or indirectly as proposed here.

The use case is "we want to work with the GIS community"! It's not too broad, it's very very real. I personally wouldn't want to drip feed all of WKT into CF, and I think this is a very pragmatic way forward.

comment:31 follow-up: Changed 9 years ago by caron

On 11/21/2011 1:40 AM, Jonathan Gregory wrote:

#69: Specification of Coordinate Reference System properties in Well-Known Text format


Reporter: pbentley | Owner: cf-conventions@…

Type: enhancement | Status: new

Priority: medium | Milestone:

Component: cf-conventions | Version:

Resolution: | Keywords:


Comment (by jonathan):

Dear all

Regarding John's proposal:

CF may incorporate an outside convention into it when the following

conditions hold:

  1. The semantics of the convention are important to the CF community.
  2. The convention is already in wide use by other communities, and the

adoption by CF significantly helps other communities adopt CF.

  1. The convention is not in conflict with existing CF standards.

I agree with these principles too. For example, udunits is a convention which CF uses. CF could adopt conventions for discovery metadata which would extend the fairly small capability it currently has in that area.

Ok, thats helpful that we agree on this. Does anyone else want to weigh in on this? I guess there was a suggestion to post to general discussion list, which i will.

However, the third principle is not satisfied in the case of this WKT proposal, in my opinion.

CF and WKT obviously conflict in the case of grid_mapping attributes which are already defined e.g. semi_major_axis, inverse_flattening. Such conflicts can't be detected or resolved without software which can parse WKT and knows how to translate between WKT and grid_mapping attributes. It is very likely that they would lead to ambiguous i.e. internally inconsistent files. Allowing both grid_mapping and WKT in these cases is a recipe for inconsistency, but such conflicts could be avoided if we allowed either grid_mapping or CRS WKT for a given data variable, but not both at once.

However, this would not avoid other more subtle conflicts between CF and WKT that we aren't certain about. As I wrote before, in CF the prime meridian and the coordinates of the geographical coordinate system must be in degrees (if specified as latitude and longitude), but WKT allows for them to be in other units. The units of the projection coordinates in CF are attributes of those coordinate variables, and could conflict with what was encoded in WKT. I think there may be conflicts in the specification of vertical coordinate systems, but I don't understand enough of what the WKT specification means to be sure.

The crux of the matter is that the proposal allows an alternative encoding of CRS (Coordinate Reference Systems). An alternate encoding runs the risk of inconsistency. I would say that just because the convention overlaps, doesnt mean the convention is in conflict, just that someone might get one or the another wrong. Incorrect encodings can (and does) already happen now, although this proposal would increase the chances of that.

It seems that if we had something that could parse WKT and detect inconsistencies, this problem would be mitigated. Would that satisfy your concerns, Jonathan?

comment:32 in reply to: ↑ 31 ; follow-up: Changed 9 years ago by jonathan

Dear John, Bryan et al.

Replying to caron:

It seems that if we had something that could parse WKT and detect inconsistencies, this problem would be mitigated. Would that satisfy your concerns, Jonathan?

Yes, it would. That's just what I think we need. In order to detect inconsistencies, we have to parse WKT, and we need rules to work out how it corresponds to other CF metadata. I am sure those rules can be written down. How should we go about doing it? As I've said before in this ticket, I'm not opposed to WKT, just concerned about inconsistencies, which is what I meant by "conflict".

Replying to Bryan:

Nearly all the interesting interdisciplinary work we do with hydrology community depends on a plethora of GIS coordinate systems.

OK. Please could you point me for the WKT for a few examples? It would be useful to see what aspects are commonly used.

I personally wouldn't want to drip feed all of WKT into CF

That gives the impression that there is a lot of it. There may be a lot of coordinate systems, but the syntax for WKT is really pretty small. That's why I think we actually can get to grips with the parts that are probably needed without too much difficulty. I still think a more robust solution would be to generate WKT rather than to parse it. I assume that the main need for this proposal is because people want to get WKT from CF-netCDF files, which could be satisfied just as well by generating it on the fly as by storing it in files. In fact, the ability to generate WKT would be more generally useful since it could be applied to any CF file, not just one whose author had helpfully stored the WKT in it. Is that correct?

Alternatively, what about my alternative proposal, of allowing WKT or grid_mapping, but not both. Consider, if we had not already invented grid_mapping (I can't remember why we did), would we do it now, or would we just propose to use WKT for this purpose? What does grid_mapping offer which WKT does not? Actually, one answer to that is there are some systems we need, such as rotated pole, which WKT can't describe. Are there other reasons why grid_mapping is useful or preferable?

Jonathan

comment:33 in reply to: ↑ 32 ; follow-up: Changed 9 years ago by markh

Replying to jonathan:

Alternatively, what about my alternative proposal, of allowing WKT or grid_mapping, but not both. Consider, if we had not already invented grid_mapping (I can't remember why we did), would we do it now, or would we just propose to use WKT for this purpose? What does grid_mapping offer which WKT does not? Actually, one answer to that is there are some systems we need, such as rotated pole, which WKT can't describe. Are there other reasons why grid_mapping is useful or preferable?

I think this may offer a good solution, enabling a CF file to make use of WKT, which is what we are looking to enable.

I think it is important that a variable is used to contain the WKT string, such that it is implemented in the same way as a grid_mapping variable.

This would involve a new attribute being created for a data variable:

  • coord_ref_system

as well as the new variable type:

  • coordinate reference system variable

with its new attribute defined:

  • crs_wkt

Jonathan: is this consistent with your alternative proposal?

e.g.

dimensions:
  x = 800 ;
  y = 600 ;
  time = 30 ;

variables:
  double x(x) ;
    x:standard_name = "projection_x_coordinate" ;
    x:long_name = "British National Grid eastings" ;
    x:units = "m" ;
  double y(y) ;
    y:standard_name = "projection_y_coordinate" ;
    y:long_name = "British National Grid northings" ;
    y:units = "m" ;
  double time(time) ;
    ...
  double lat(y, x) ;
    ...
  double lon(y, x) ;
    ...

  // a data variable whose CRS definition is provided by the 'bng_crs' grid mapping variable
  float precip(time, y, x) ;
    precip:standard_name = "rainfall_amount" ;
    precip:coordinates = "lat lon" ;
    precip:coord_ref_system = "bng_crs" ;
    ...

  // grid mapping variable containing a WKT definition of the British National Grid
  int bng_crs ;
    bng_crs:crs_wkt = "PROJCS ["OSGB 1936 / British National Grid",
      GEOGCS ["OSGB 1936",
        DATUM ["OSGB 1936", SPHEROID ["Airy 1830", 6377563.396, 299.3249646]],
        PRIMEM ["Greenwich", 0],
        UNIT ["degree", 0.0174532925199433]],
      PROJECTION ["Transverse Mercator"],
      PARAMETER ["False easting", 400000],
      PARAMETER ["False northing", -100000],
      PARAMETER ["Longitude of natural origin", -2.0],
      PARAMETER ["Latitude of natural origin", 49.0],
      PARAMETER ["Scale factor at natural origin", 0.9996012717],
      UNIT ["metre", 1.0]]" ;
    ...

I think the 'either or' approach offers good protection against inconsistent data set definitions, and offers a plausible option for checking CF consistency.

Would people who support the proposal on this ticket also support this alternative, as set out by Jonathan and I?

comment:34 in reply to: ↑ 33 Changed 9 years ago by jonathan

Dear Mark

Thanks for your contribution.

I think it is important that a variable is used to contain the WKT string, such that it is implemented in the same way as a grid_mapping variable.

I agree the WKT should be contained in a separate variable, so it can be shared by several data variables. However I would do it slightly more simply than you. The proposal was just to provide a place for WKT. Hence I would propose an attribute crs_wkt of the data variable, which could have either this attribute or grid_mapping. The crs_wkt attribute would name a char variable containing the WKT:

dimensions:
  x = 800 ;
  y = 600 ;
  time = 30 ;
  maxwktlen=600;

variables:
  double x(x) ;
    x:standard_name = "projection_x_coordinate" ;
    x:long_name = "British National Grid eastings" ;
    x:units = "m" ;
  double y(y) ;
    y:standard_name = "projection_y_coordinate" ;
    y:long_name = "British National Grid northings" ;
    y:units = "m" ;
  double time(time) ;
    ...
  double lat(y, x) ;
    ...
  double lon(y, x) ;
    ...

// a data variable whose CRS definition is provided by the 'bng_crs' CRS WKT variable
  float precip(time, y, x) ;
    precip:standard_name = "rainfall_amount" ;
    precip:coordinates = "lat lon" ;
    precip:crs_wkt = "bng_crs" ;
    ...

// variable containing a WKT definition of the British National Grid
  char bng_crs(maxwktlen) ;
data:
  bng_crs= "PROJCS ["OSGB 1936 / British National Grid",
      GEOGCS ["OSGB 1936",
      DATUM ["OSGB 1936", SPHEROID ["Airy 1830", 6377563.396, 299.3249646]],
      PRIMEM ["Greenwich", 0],
      UNIT ["degree", 0.0174532925199433]],
      PROJECTION ["Transverse Mercator"],
      PARAMETER ["False easting", 400000],
      PARAMETER ["False northing", -100000],
      PARAMETER ["Longitude of natural origin", -2.0],
      PARAMETER ["Latitude of natural origin", 49.0],
      PARAMETER ["Scale factor at natural origin", 0.9996012717],
      UNIT ["metre", 1.0]]" ;

Actually this doesn't work because " can't be embedded in a string constant without \ in front of it. I wonder if WKT permits ' to be used instead of ".

Jonathan: is this consistent with your alternative proposal?

Yes. It is also consistent with Bryan's earlier comment, which I had not interpreted quite right. He was saying, we need somewhere to store the WKT, without taking responsibility for its contents.

I think the 'either or' approach offers good protection against inconsistent data set definitions, and offers a plausible option for checking CF consistency.

Yes, it is easy to check that only one of crs_wkt or grid_mapping is specified. I like the WKT not to be in the grid mapping, because it seems to me that it is an alternative, rather than a supplement, owing to the overlaps. Permitting only one of them avoids some overlaps. There is still some potential for inconsistency.

I would support this compromise, but I still think it's better to extend grid_mapping to contain extra information that can be used to generate WKT. Etienne Tourigny's posting to the email list looks encouraging to me.

Jonathan

comment:35 follow-up: Changed 9 years ago by markh

Summary of current position:

  • Current proposal:
    • Of ticket commenters: 6 in favour, 1 against
    • concern is centred on the difficulties which may arise from an inconsistent specification within a grid mapping attribute.
  • An amended proposal has been put forward, which offers much of the same functionality, but using a different mechanism which may mitigate some of the risk of inconsistency.

To clarify this issue, please could supporters of the current proposal indicate their view on the amended version of the proposal put forward by Jonathan, perhaps as:

  • 'equally content' to support this approach,
  • 'less content but still supportive' of this approach,
  • 'not content' with this alternative to the ticket proposal.

thank you

mark

comment:36 Changed 9 years ago by markh

I am 'equally content' with Jonathan's alternative proposal

comment:37 Changed 9 years ago by dominic.lowe

Apologies for joining this discussion so late...

However it seems to me that requiring the use of grid_mapping *OR* WKT, while it may remove the risk of inconsistencies, does not help much with interoperability.

It will introduce a new situation where files can only be read by WKT-aware clients *OR* grid_mapping-aware clients. Of course client software can be adapted to convert from WKT to grid_mapping (and vice versa), but if this is the expectation aren't we just placing the burden of effort (and risk of inconsistencies) onto the data consumer rather than the data provider?

If mapping/checking code has to be written anyway then why not use this code at the data-creation step and make life much easier for clients? I just don't see how absolving data-providers of this responsibility is the best all-round solution when everything is taken into account.

Furthermore the *OR* choice makes the data-provider choose upfront which client community is more important to support so makes it hard to reach out to new communitities while continuing to support existing users.

I preferred the original suggestion to allow both encodings with the "grid_mapping takes precendence" statement in the CF conventions as originally suggested. (Or you could even define an attribute that could be used to state/override which takes precedence.)

Regards,

Dominic

comment:38 Changed 9 years ago by jonathan

Dear Dominic

The problem as I see it with "grid_mapping takes precedence", as discussed a bit earlier in the ticket, is that it expects software to be able to interpret both WKT and grid_mapping attributes. In practice, what is almost certain to happen is that software will look at one or the other, and hence if there is an inconsistency the file will be processed in different ways by different applications. Even to check the correctness of a file would require the CF_checker to know the equivalence between WKT and grid_mapping, and this ticket was proposed in part because that's a complicated job which people didn't want to write into CF at this stage, although defining this equivalence is really the only way to achieve interoperability. My alternative proposal is to be safe rather than sorry.

Cheers

Jonathan

comment:39 in reply to: ↑ 35 Changed 9 years ago by pbentley

Replying to markh:

To clarify this issue, please could supporters of the current proposal indicate their view on the amended version of the proposal put forward by Jonathan, perhaps as:

  • 'equally content' to support this approach,
  • 'less content but still supportive' of this approach,
  • 'not content' with this alternative to the ticket proposal.

If I felt that the alternative was technically superior I'd be happy to tick the 'equally content' box. Personally - and being totally unbiased of course! - I don't believe that the alternative improves upon the original, which merely proposes the addition of a single, optional grid_mapping attribute. The alternative proposal would require a new attribute, a new dimension, and a new data variable - an excess of machinery IMHO.

Moreover, in the alternative proposal, the specification of a coordinate system could potentially end up being split across two separate constructs (grid_mapping plus the proposed new crs_wkt variable), the second of which would not even be visible in common metadata listings (e.g. using ncdump -h). That seems like a retrograde step to me, though others may see it as a bonus!

Sure, the alternative proposal says it should be an XOR choice. But I can imagine that data producers would simply end up using both constructs in order to ensure maximum file readability by existing netCDF software clients (cf. Dominic's recent comments in this vein).

[I'd also contend that earlier claims (elsewhere in this ticket) to the CF checker as a potential discriminator of the technical alternatives is a red herring. Clearly, people don't use the CF checker to write netCDF files. Nor do they use the CF checker to read netCDF files. Faced with a netCDF file that may not be CF-compliant at version X, people don't ignore such files, they find ways of reading them. Many of the files in the CMIP5 archive, for instance, are not strictly CF-compliant. But that doesn't mean that i. they're any less useful; or ii. existing software tools can't read them - they can and do.]

So, if achieving some kind of resolution of this ticket means ticking the 'less content but still supportive' option, then I'm perfectly willing to do that. But with a clear majority in favour of the original proposal, that would seem to be a strange outcome indeed, one that would set an interesting precedent for subsequent CF proposals.

The alternative, I suppose, is to shelve this proposal and see if a dominant solution emerges within the netCDF software space (the likelihood, of course, is that multiple competing solutions will emerge).

Regards,

Phil

comment:40 Changed 9 years ago by jonathan

Even if data-writers did (incorrectly) put both grid_mapping and WKT into the same file, I think it is still a bit better to have them separated, because it emphasises that WKT is a "guest" rather than part of the CF standard, and that it may not be consistent with grid_mapping.

Errors in files don't necessarily mean they're less useful, but often they are less useful. There are many common errors that can be detected by the CF-checker, which if they are not corrected mean that horrible fudges have to be written into analysis programs to deal with them, and in some cases the files can't even be interpreted by human intelligence. This is why the CF standard tries to minimise the possibilities for errors and inconsistencies. In the case of WKT, we would not be able to detect them, and we should point that out as clearly as we can.

Jonathan

comment:41 Changed 9 years ago by P.Kennedy

While WKT is an option, I fear it brings more complexity than it needs to. I suspect the complexity of forming WKT will not be well adopted. This has been the case for WMS, GML and GeoJson?. They found a more pragmatic approach was to simply specify an EPSG code, as this would do the job just as well as WKT but with much less fuss.

Please take note of the Postgis Documentation http://www.postgis.org/documentation/ pp.148 "WKT format does not maintain precision so to prevent floating truncation, use ST_AsBinary or ST_AsEWKB format for transport.

Instead, PostGIS and OGC use the notion of 'Spatial Referencing Identifier' or SRID. SRID An integer value that uniquely identifies the Spatial Referencing System (SRS) within the database For example: -- Add some data into the test table INSERT INTO global_points (name, location) VALUES (’Town’,ST_GeographyFromText(’SRID=4326;

The OGC GML specificaiton uses a short form to reference the spatial reference, e.g. <gml:Point srsName="urn:x-ogc:def:crs:EPSG:6.6:4326"><gml:pos>71. -32.</gml:pos></gml:Point>

Similarly, the Web Mapping Service specificaiton (WMS) follows a similar pattern: http://www.opengeospatial.org/standards/wms EXAMPLE 2 A <BoundingBox?> representing the entire Earth in the EPSG:4326 Layer CRS would be written as <BoundingBox? CRS="EPSG:4326" minx="-90" miny="-180" maxx="90" maxy="180">.

Therefore, in addition to the proposed crs_wkt can I suggest the standard also documents a simplified notation, such as that promoted and widely in use today.

something like this would be simple and efficient: spatial_reference_identifier EPSG:4326

regards Paul Kennedy

comment:42 Changed 9 years ago by jonblower

@Paul:

A spatial_reference_identifier can be a useful and practical shorthand, but it doesn't address the problem of how to relate coordinate axes in the CRS definition to the CF coordinate axes. For example, in EPSG:4326, how does a computer know that the definition of the latitude axis should be applied to the coordinate axis that's called "y" in the NetCDF file? Should this be done through the standard_name? What about polar plots, where there is no obvious convention for x and y?

Axis order confusion is a common bugbear in GIS (I've seen different versions of the EPSG database apply different orderings) so we need to be very careful not to propagate or multiply the confusion in CF.

Best wishes, Jon

comment:43 Changed 9 years ago by P.Kennedy

Hi, Agreed. X<=>Y Lat<=> are easy to mix up in spatial data.. However neither EPSG /WKT will resolve this. It is far more deep seated and needs attention from the scientist to make sure things are correct.

I still see a place for a simple EPSG code as being an easy option for many scientists who do not wish to (or should) get bogged down with geodetic definitions in metadata and just need a simple method which permits data transfer as easy as possible.

KML, GML, WMS TMS and WFS all provide easy mechanisms to indicate the spatial reference.

Having a simple EPSG code does not preclude a more detailed WKT specification where something unusual is required, but I believe the simple EPSG would cover 99% of cases.

My fear is without a simple mechanism, the spatial reference metadata will simple not be specified, which is typically the case today.

comment:44 Changed 9 years ago by jonblower

Hi,

neither EPSG /WKT will resolve this.

I think WKT (or any mechanism that explicitly records the individual axes) can resolve this. We just need a convention for identifying the axes in both WKT and CF and linking the two. The problem with the EPSG code is that the axis order is opaque, requiring an external look-up, so there's no easy way to relate the CF axes to a geodetic system.

KML, GML, WMS TMS and WFS all provide easy mechanisms to indicate the spatial reference.

I'm afraid this isn't a totally fair comparison. All those systems use coordinate tuples to identify points in space, and the CRS definition describes how to interpret the tuples. CF-NetCDF stores data as arrays. We need a way to describe the mapping between coordinate tuples and array indices, and this is what the current CF grid mapping systems give us. But this mapping can't be found in the general case with a simple EPSG code, unless there is some other out-of-band convention.

I believe the simple EPSG would cover 99% of cases.

It would be instructive to work through some examples here and figure out exactly how an automated system would interpret a real CF-NetCDF data file that is constructed in the way you describe. My belief is that it's harder than you may think (try a polar projection). ;-)

Best wishes, Jon

comment:45 Changed 9 years ago by caron

Perhaps im missing something, but CF can unambiguously identify the lat and lon axis, or the x and y axis of a projected CRS.

I think that EPSG / WKT probably also has an unambiguous notation for lat/lon or x/y.

The problem that i have seen was that when returning a coordinate pair, there was confusion on what order was needed. That confusion would be local to that particular request/response protocol.

What is difficult for the non-expert is to understand which is the x and which is the y coordinate, because of the crude analogy lat=y, lon=x which breaks down for polar coordinates. The correct understanding is "x,y on the projection plane defined by the projection map:(x,y) <-> (lat,lon)". The ordering (x,y) and (lat,lon) is arbitrary, and depends on the implementation, ie the projection map described abstractly doesnt care. however, the projection has a clear definition of x and of y, these are not arbitrary.

With a concrete projection implementation its pretty easy to see when you have the x,y reversed, eg just look at a WMS image of anything that follows a coastline.

Sp IMO, with the exception of poorly specified protocols for the ordering of request/response coordinates, presumably now corrected, theres no deep problem here.

As I say, I may be missing something.

comment:46 follow-up: Changed 9 years ago by jonblower

CF can unambiguously identify the lat and lon axis, or the x and y axis of a projected CRS.

I think that EPSG / WKT probably also has an unambiguous notation for lat/lon or x/y.

That's right - there's no deep problem provided that there's a means or convention to map between the two, e.g. using the same identifier for axes in both CF and WKT. I haven't seen such a mapping proposed and tested, although I may have missed it.

There is a deeper problem when the axis definitions don't appear in the WKT, or if the EPSG code is used alone. In this case there's no way for a client to figure out the mapping unless they look up the full definition of the CRS in an external database.

comment:47 in reply to: ↑ 46 Changed 9 years ago by caron

Replying to jonblower:

That's right - there's no deep problem provided that there's a means or convention to map between the two, e.g. using the same identifier for axes in both CF and WKT. I haven't seen such a mapping proposed and tested, although I may have missed it.

Here the part we are still not yet on the same page:

CF knows the association of an axis (variable) to, eg "latitude coordinate". The WKT definition is in terms of, eg "latitude coordinate". So thats the mapping, I dont think anything else is needed, and i dont think you want to put the name of the axis variable in the WKT.

There is a deeper problem when the axis definitions don't appear in the WKT, or if the EPSG code is used alone. In this case there's no way for a client to figure out the mapping unless they look up the full definition of the CRS in an external database.

Here the part where im probably missing something, but I dont see why you need the "axis definition" in the WKS. Perhaps an example would help?

It may be that Im just thinking of the projection part of the CRS, where the actual grid is not described. When one wants to describe the actual grid, then you need the values of the axes. But those are exactly the values that are in the coordinate axis of the netcdf/CF file, so theres no problem generating those. Presumably theres some other problem I havent run into?

comment:48 follow-up: Changed 9 years ago by jonblower

Hi John,

I wonder if we're talking at cross purposes, because I'm still not quite getting it... ;-)

CF knows the association of an axis (variable) to, eg "latitude coordinate"

What is the string "latitude coordinate" in this case? Is it an attribute of the CF coordinate variable? In your above sentence, what are you saying that the axis is associated to? Or do you mean the abstract concept of a latitude coordinate?

comment:49 in reply to: ↑ 48 Changed 9 years ago by caron

Replying to jonblower:

Hi John,

I wonder if we're talking at cross purposes, because I'm still not quite getting it... ;-)

CF knows the association of an axis (variable) to, eg "latitude coordinate"

What is the string "latitude coordinate" in this case? Is it an attribute of the CF coordinate variable? In your above sentence, what are you saying that the axis is associated to? Or do you mean the abstract concept of a latitude coordinate?

Yes the abstract concept, the same one as a WKT projection has.

comment:50 Changed 9 years ago by etourigny

Guys, just jumping into this ticket. I have been engaged in a discussion with Jonathan Gregory (mostly on the mailing list) about adding parameters to the CF spec to fully represent a WKT definition (without the WKT string).

Although I feel I don't understand all the details of this thread, I would like to know if the current CF spec allow for full representation of WKT AXIS parameters, or if we would need additional CF parameters?

Can we simply map WKT {AXIS["Lon",EAST]} and {AXIS["Lat",NORTH]} to CF {lon:units = "degrees_east" ;} and {lat:units = "degrees_north" ;}

How about the case of projected crs with {AXIS["X",EAST]} and {AXIS["Y",NORTH]} ?

comment:51 Changed 9 years ago by jonathan

Dear Phil

As I remarked (out of place) on ticket 80, my main reservation about this proposal is the inconsistency that arises if both WKT and grid_mapping is included. This has been discussed before. Nothing prevents people from writing CF-compliant files which also contain non-CF metadata, but the design of CF metadata generally tries to avoid self-contradiction or to provide ways of detecting it. That is why I proposed that WKT and grid_mapping should not both be allowed in the same file. Maybe that wouldn't help, as has been suggested above.

Should we expect software which aims to implement the CF convention to have the facility to interpret WKT, or should it be regarded as a string which only some software would interpret? Of course, this question immediately raises (for me) my concern that software which looked at the grid_mapping and software which looked at the WKT might behave differently with the same file, and that doesn't seem like a good possibility for CF to support.

The debate on this ticket has not persuaded me that there isn't a problem, so I can't honestly withdraw my objection, since my concern remains. However, you are right that if I am the only person who is sufficiently concerned about this problem to argue that we should not accept this proposal, then I am probably unnecessarily concerned and the ticket should be accepted despite my reservations about it, provided any other issues raised have been resolved.

Cheers

Jonathan

comment:52 Changed 9 years ago by mcginnis

I think the issue of internal inconsistency in the CRS should not be regarded as a show-stopper, as it's nothing new; this is already a problem we have to worry about.

A file with a curvilinear coordinate system is supposed to have both 2-D lat/lon values for the gridcells and a grid_mapping variable describing the CRS. This is a redundant specification, and I can assure you that it's possible to have them be inconsistent in a file that passes the CF-checker. Allowing WKT and grid_mapping may enable a similar problem, but I would argue that it doesn't really make things any worse than they already are.

So in light of that, I think this ticket is acceptable with or without a requirement that WKT and grid_mapping be mutually exclusive -- though if they are allowed to co-exist, it would probably be smart to include a strongly-worded caution or recommendation about making sure they're consistent.

However, I really dislike mixing syntaxes. It makes it much, much harder both to parse metadata and to get the metadata right in the first place. (It was a lot of work for me to get correct grid-mapping values from the climate modelers I work with, and I think if I was asking them for WKT, I would still be working on it.)

So I'm fine with *allowing* WKT, but I would be pretty strongly opposed to a reading of this proposal that resulted in a *requirement* that WKT be included in netCDF files. I don't think anyone is pushing for that, but I wanted to make sure that the assumptions don't drift in that direction, and that we don't end up with such a requirement as a side-effect.

comment:53 Changed 9 years ago by jonathan

Dear all

Various people have said that it would OK to have both a CF-style description and WKT in the grid_mapping, despite the possible inconsistency, and only I have argued that they should be mutually exclusive. Therefore I'll drop that argument and accept the majority view, and agree that both this ticket and ticket 80 should be accepted.

Phil has already some fairly strongly worded cautions in the text of the new section 5.6.1 about avoiding inconsistency. Nonetheless, I'd like to suggest a few modifications to the text of 5.6.1 as proposed at the start of this ticket. In the third paragraph, I think we should avoid implying that client software will evolve to support both methods - it might do so, but we can't expect it - and we should require the single-property attributes to be used for a thorough description, since ticket 80 proposes some more attributes, which will enable a complete description in the examples considered. For the third paragraph, I would propose:

The crs_wkt attribute is intended to act as a supplement to other single-property CF grid mapping attributes (as described in Appendix F); it is not intended to replace those attributes. If data producers omit the single-property grid mapping attributes in favour of the compound crs_wkt attribute, software which cannot interpret crs_wkt will be unable to use the grid_mapping information. Therefore the CRS should be described as thoroughly as possible with the single-property attributes as well as by crs_wkt.

In the fifth paragraph, I suggest that the text

in the absence of an overriding CF-maintained list, the OGP/EPSG registry of geodetic parameters [OGP/EPSG] is considered to represent the definitive authority as regards CRS property names and values (it is noted that some examples in the published literature do not always adhere to the OGP/EPSG values).

should be replaced with

the values shown in <link1> should be preferred; these are derived from the OGP/EPSG registry of geodetic parameters, which is considered to represent the definitive authority as regards CRS property names and values.

where <link1> is https://cf-pcmdi.llnl.gov/trac/wiki/Cf2CrsWkt, especially its attachments, compiled by Etienne and Phil to support this ticket and ticket 80.

Best wishes and thanks for your patience with this long argument

Jonathan

comment:54 Changed 9 years ago by pbentley

Jonathan: Thanks for lifting your objections and for suggesting the above changes. Your proposed amendments are fine with me.

Mark: As moderator for this ticket do you want to close it as accepted? I don't believe there have been any further objections in the past 3 weeks.

Phil

comment:55 Changed 8 years ago by jonathan

Enough support has been given to this ticket for it to be accepted according to the rules, and there are no outstanding objections or comments, so it is therefore concluded and should be included in the next version of the CF standard. The changes to be made are as given in section 4.2 of Phil's text at the start of this ticket, as modified by comment 53.

Phil is already named as an additional author of the CF standard.

Both this ticket and ticket 80 modify section 5.6 of the CF standard document. In addition, both tickets refer to documents on the trac wiki. When the new CF version is compiled, I suggest that we should label these documents clearly to indicate they are "live" and should be treated carefully, though I don't think they need to be maintained as part of the CF standard document, because the information they provide is ancillary to the CF standard and derived from standards maintained by other authorities.

Thanks again for working on this, Phil.

Jonathan

comment:56 Changed 5 years ago by painter1

The wiki links should use http, not https. Thus, http://cf-pcmdi.llnl.gov/trac/wiki/Cf2CrsWkt

comment:57 Changed 5 years ago by jonathan

Thanks for clarifying that. Still, I feel it would be good if we could copy that document to somewhere in the cfconventions repository and website, for safekeeping and maintenance. Jonathan

comment:58 Changed 5 years ago by markh

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.