Opened 4 years ago

Last modified 4 years ago

#154 new task

Machine Readable Standardised Region List

Reported by: martin.juckes Owned by: cf-conventions@…
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: Cc:

Description (last modified by martin.juckes)

The CF Standardised Region List http://cfconventions.org/Data/cf-standard-names/docs/standardized-region-names.html is currently only available as a HTML page. Ideally, it should be available as a machine readable document, so that the CF checker and other software can access the names, as discussed, for instance, in #153.

This, with #153, would bring the number of CF vocabulary documents to 4:

  • CF Standard Names
  • CF Area Types
  • CF Standardised Regions
  • CF Rules Related to Standard Names (under discussion: #153)

There should be some common approach to structuring these documents, based on the establised practise with the CF Standard Names (an XML reference document, exported to SKOS). There is also an interest in having JSON versions, which are easier to import into python.

The general structure used for the standard names and area types is clear: some header information followed by a sequence of "entry" elements, each with an "id" attribute and one or more simple child elements. The schema used for area types could be used directly for standard regions. The CF standard name schema could be adapted for the rules with some minor change of terminology.

Change History (6)

comment:1 Changed 4 years ago by martin.juckes

  • Description modified (diff)
  • Type changed from defect to task

comment:2 Changed 4 years ago by martin.juckes

  • Description modified (diff)

comment:3 Changed 4 years ago by jonathan

Dear Martin

I agree with this proposal in principle. Thanks for the initiative. Perhaps Alison will have some comments to make about the format.

Best wishes

Jonathan

comment:4 Changed 4 years ago by apamment

Dear Martin and Jonathan,

I have created a draft xml version of the standardized_region_names document at https://github.com/cf-convention/cf-convention.github.io/blob/master/Data/cf-standard-names/docs/standardized_region_names.xml .

The schema is even simpler than for the area_types table - essentially it's just a list of entries as in the html document. There is no definition text or other attributes to add to the regions. At present the list of region names doesn't have a version or date stamp, unlike the standard names and area types tables. It would be easy to add such information if it would be useful, although the list changes very infrequently so I don't know if there's really a need.

If Martin and others are happy with the format of the document I will add links to it from the CF web pages.

Best wishes, Alison

comment:5 Changed 4 years ago by martin.juckes

Dear Alison,

that looks fine, but I think it would be useful to have a date stamp .. it often helps to have some record of when a file was created.

And I've just noticed that it is missing a closing standardized_region_names tag.

regards, Martin

comment:6 Changed 4 years ago by apamment

Dear Martin,

I've added the missing tag at the end of the file. You said it would be useful to have a date stamp and Roy Lowry (via email) has reminded me that the CF region names are published as vocabulary list P30 in the NERC Vocabulary Server (NVS). He suggests, and I agree, that we make the dates consistent between the files on the CF web site and the NVS lists as is already done with the standard names and area types tables. I have implemented this suggestion and also added a version number (currently 2). I've also added links to the XML file from the main Standard Names and Documents pages on the CF web site. The location of the file is now http://cfconventions.org/Data/cf-standard-names/docs/standardized-region-names.xml .

You mention in the proposal description that you would also like a JSON version of the region names. Rob Thomas (formerly at the British Oceanographic Data Centre and now at the Irish Marine Institute) has kindly shown me that the list can be obtained in JSON format direct from the NVS by querying the SPARQL endpoint. The same method can also be used to obtain JSON listings of the standard names and area types tables. The queries are rather complex to reproduce in trac, so I have made a small html file on the CF website, http://cfconventions.org/Data/cf-standard-names/docs/CF_vocabs_JSON_links.html, which contains the links to extract the JSON lists. I have not had much time to experiment with SPARQL and perhaps the queries can be customised further to adjust the output, but please have a look and let me know whether you think these links are useful. If so, I will add these to the main CF web pages too.

Best wishes,

Alison

Note: See TracTickets for help on using tickets.