Opened 14 years ago

Closed 12 years ago

#14 closed enhancement (wontfix)

Allow time coordinates to be stored as a W3C / ISO 8601formatted String

Reported by: caron Owned by: cf-conventions@…
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: Cc:

Description

Currently time coordinates must be stored as a number with units that are parsable by the udunits library (cf section 4.4).

I propose that we also allow time coordinates to be expressed as W3C/ISO 8601 formatted Strings. (Example: 2007-03-29T12:00:00Z) This format is widely used internationally, it is a profile by the W3C of the ISO standard. This page has a summary of the format and more links:

http://www.unidata.ucar.edu/projects/THREDDS/tech/interfaceSpec/NetcdfSubsetService.html#Reference

Reasons:

  • A string representation is human readable (compare with "348472938 secs since 2007-03-29 12:00:00Z")
  • Independence from the udunits library when needed.
  • May be easier to use for some.

Proposed change:

1) Change section 4.4 to start with:

"Date coordinates must either be of type char, formatted according to the W3C/ISO 8601 Date format, or of numeric type with a mandatory units attribute containing a udunit compatible date unit string."

2) Add an appendix that describes the W3C/ISO 8601 format, or simply point to an external link.

Note: I find it clearer to use "date" to refer to "calendar date", and "time" to mean a "unit of time duration" (eg 1500 secs). It might be worth looking through our usage to see where this meaning may be unclear.

Change History (12)

comment:1 Changed 14 years ago by stevehankin

There was a very detailed and lengthy discussion of this general topic (human-readable dates in CF time axes) in March-May of 2000 on the CF email list. (This is a great illustration of the improved institutional memory that CF should achieve through this trac site.) Reiterating two of the key points from that discussion:

  1. regarding human readibility:

NetCDF files are binary files. Therefore optimized for machine access rather than human readability. Utilities like ncdump and ncgen provide human readability and text-editor writability respectively.

Here's a variant on the question that has been posed: Should utilities such as ncdump and ncgen support viewing of time coordinate variables as human-readable (ISO) date strings?

  1. regarding "easier to use"

When we think of CF/netCDF files as a vehicle to achieve interoperability between data writers and data readers, then supporting multiple encodings for precisely the same information content nearly always makes the problem harder, rather than easier, since clients must support multiple sets of logic to do the same thing.

Again a variant on the question: When considering geo-aware libraries for reading and writing netCDF data, would there be significant value in offering methods that take ISO date strings as arguments? This approach would provide the ease-of-use advantages proposed above, without adding any new complexity either to the CF standard or to clients that read CF data. (It's a matter of who has to do the work and how many distinct clients have to implement the same new logic.)

comment:2 follow-up: Changed 14 years ago by jonathan

Dear John

In the previous discussions we had (probably the ones Steve is recalling too) I argued for more human-readable times. However I was convinced by Steve's arguments for simplicity then, and I agree with him now. At that time, Russ agreed that ncgen and ncdump could be extended such that CDL could show times in a human-readable form, with this being translated to and from numeric values with udunits; for CF purposes, the translation would have to support all the calendars, not just the real-world calendar (which is the only one that udunits can handle at present). Maybe Russ could comment on whether this extension to the netCDF utilities could be made? Likewise, as Steve says, other APIs which read and write netCDF files could do the same translation.

If we adopted your proposal, on the other hand, all applications which process CF-netCDF files would have to be able to handle the string as well as the numeric representation of time. That would not be backward-compatible in the sense that files containing times as strings would "break" all the existing software. Although strings are more human-readable, they have significant disadvantages for processing. For example,

  • You frequently need to extract "components" of the time e.g. the year, in order to print the time in a particular form. Parsing complicated strings to extract bits of them is awkward.
  • You often need to do calculations on time-coordinates e.g. for working out the length of time-intervals. You can't do that with strings.
  • You have to do different kinds of comparison with strings to test whether they are in order.
  • They take up more space in the file even than a double-precision number. This is not normally an issue, but it could be for timeseries of station data, where the coordinates can be as large as the data.

I guess the ISO standard also does not support non-real-world calendars.

Cheers

Jonathan

comment:3 Changed 14 years ago by caron

Here's a variant on the question that has been posed: Should utilities

such as ncdump and ncgen support viewing of time coordinate variables as human-readable (ISO) date strings?

The problem that I see is that ncdump and ncgen dont know what a time coordinate is, so it probably cant automatically add this conversion, and you might not want it to. One could add a flag to the command line that could do it, though.

comment:4 Changed 14 years ago by caron

As I read Steve's and Jonathan's arguments, I think the gist of the problem for me is "dependence on the udunits library". So let me elaborate on that:

IMO, putting calendar date parsing into udunits was a mistake. udunits is about dimensional units. there should be a different library for date parsing, that can handle calendars etc. Im guessing Steve Emmerson would agree with me on this. (Steve, es verdad?) I understand why it was done: for the common case of realtime data in the near future and past, it was an easy extension to add to udunits.

Anyway, theres my prejudice, I think we should move towards a new, more robust time standard format and supporting libraries. The first step would be to allow ISO String representations. Next would be to find (or develop) standards for Calendars. Then develop a library that would handle both ISO and udunit formats.

We already cant use udunits for dates in non-standard calendars, so this is just naming the problem and setting up a framework for an eventual solution.

comment:5 Changed 14 years ago by caron

Hi Jonathan, snips and comments below:

That would not be backward-

compatible in the sense that files containing times as strings would "break" all the existing software.

Yes, I guess its the price of progress. Hopefully they will break gracefully and be upgraded.

Although strings are more human- readable, they have significant disadvantages for processing.

  • You frequently need to extract "components" of the time e.g. the year, in order to print the time in a particular form. Parsing complicated strings to extract bits of them is awkward.
  • You often need to do calculations on time-coordinates e.g. for working out the length of time-intervals. You can't do that with strings.

You would parse the String into another object (eg a Date in Java). Its true that a udunit coordinate allows you to easily manipulate the interval, but udunits doenst understand the value, eg you cant say "what year is this?". You have to convert it to a "date object" first. So in terms of manipulating a date coordinate in a general way, udunits and ISO Strings are handled in the same way : convert to a general "date object".

  • You have to do different kinds of comparison with strings to test whether they are in order.

A lexigraphic sort on ISO strings gives the correct ordering.

  • They take up more space in the file even than a double-precision number. This is not normally an issue, but it could be for timeseries of station data, where the coordinates can be as large as the data.

Yes, this is a disadvantage, and a good reason to retain a "udunits encoding".

As for the burden of adding another representation, parsing ISO dates is trivial in Java. Perhaps someone with C experience could comment on how difficult it is there? In both cases, I would imagine developing a standard library to do the work.

comment:6 follow-up: Changed 14 years ago by stevehankin

Hi John,

Maybe a trac ticket suggesting the development of a "new, more robust time standard format and supporting libraries" is in order. (Would you want to see that on the CF site or on an internal Unidata task list?)

Attempting to move towards summary -- here are counter-arguments that have been raised:

  1. Human readability: The limited merits of human readability for data stored in a binary file were already discussed. Extending the human readability argument to an absurd extreme, we might analogously suggest that human readabilty would be improved if floating point numbers were stored as formatted strings, too. The difference between the two is that ncdump knows how to format floating point numbers, but does not currently know how to format dates.
  1. The limits of ISO date strings in scientific applications: ISO date strings were not developed to support scientific data encodings. (Granted that in the current turn of the historical crank they are being imported into the scientific realm and bringing notable benefits.) They are great for communicating a few Gregorian calendar dates. But as noted they break down with the funky calendars that climate modelers use. And they also are not well suited to the multi-dimensional analyses that scientists need to do. Time is fundamentally a linear progression just as distance is, and netCDF applications often achieve significant benefits from having time encoded in the same manner as distance. For example taking a time derivative is exactly the same numerical operation as a distance derivative. (We're seeing some of the same divergence of viewpoints here that play out in the use of Java for modeling applications. Good questions to ask. But bigger and deeper questions than this discussion of ISO string usage.)
  1. The importance of preserving stability and interoperability (a personal favorite). When the merits of a new, proposed feature are ambiguous, "no" is usually the preferred answer. (modified version of a quote that Russ Rew introduced into the discussions a couple of years ago.)

Stepping back from the particulars of this discussion to the maintenance of trac: issues that are opened need eventually to be closed. We haven't yet identified a moderator. (Or have we and I missed it?) So we ought to self-police. Is this one ready to close?

comment:7 Changed 14 years ago by jonathan

Dear John

ncdump and ncgen dont know what a time coordinate is, so it probably cant automatically add this conversion

If ncdump and ncgen were CF-enabled (or COARDS-enabled), they could identify a time coordinate, because it has units of time since reftime. CF also provides a calendar attribute that ncdump and ncgen could use.

A lexigraphic sort on ISO strings gives the correct ordering.

That's true. What I meant was, if you support both strings and numeric values, this is one aspect for which you have to write alternative code for them that works in different ways.

Hopefully they will break gracefully and be upgraded.

I really don't think it's a good idea to introduce an alternative convention which doesn't seem to offer any new functionality but which can't be handled by existing software. The software concerned is not just proper packages supported by Unidata etc., but ad-hoc programs in all the languages of the netCDF interface written by ordinary analysts of data.

You may well be right that date-time support should have been a separate package from udunits. We have been using the udunits format for non-real-world calendars even though udunits doesn't support them. Luckily most of the non-real-world calendars have fixed-length years, in which it is very easy to convert between numeric time and components of time. But I agree that software with proper calendar support and interfaces in many languages would be valuable.

However I'm not convinced of the advantage of using strings. If you're always going to use subroutines to encode and decode dates, it doesn't matter what the encoding looks like anyway. If ncgen and ncdump could handle the encoding (whatever it is) then there would be no inconvenience for human-readability. You don't argue that integers and floating-point numbers should be replaced by strings as well, because ncgen and ncdump are able to encode and decode those!

Cheers

Jonathan

comment:8 in reply to: ↑ 2 Changed 14 years ago by russ

Replying to jonathan:

... Russ agreed that ncgen and ncdump could be extended such that CDL could show times in a human-readable form, with this being translated to and from numeric values with udunits; for CF purposes, the translation would have to support all the calendars, not just the real-world calendar (which is the only one that udunits can handle at present). Maybe Russ could comment on whether this extension to the netCDF utilities could be made? Likewise, as Steve says, other APIs which read and write netCDF files could do the same translation.

Yes, this enhancement is actually on the list of tasks to implement in the ncdump/ncgen utilities. Don Hooper (NOAA ESRL/PSD) at one time made a modified ncdump available that displayed times in a human-readable form when an option was specified on the command line. I don't think that extension handled CF climate calendars, however.

We have plans to incorporate climate calendars into libcf, incorporating something similar to what Bob Drach (LLNL) and colleagues provide in their CDAT software, specifically functionality of libcdunif and cdtime.

The ncdump time representation work is currently on the back burner until we've finished a netCDF-4 version of ncdump and ncgen utilities. I'll take all the interest in this topic as an indication that having a way to display human-readable time with ncdump is a fairly high priority enhancement.

comment:9 Changed 14 years ago by bnl

Date handling using ISO strings is only going to get easier in high level languages, but ISO19108 allows a richer set of descriptions than ISO8601 (which is limited as noted above). I'd rather we broke the udunits link, and invested in an ISO19108 compatible strategy. I know Andrew's got more concrete thoughts on this ... so hopefully he'll weigh in.

comment:10 in reply to: ↑ 6 ; follow-up: Changed 14 years ago by caron

Replying to stevehankin:

Hi John,

Maybe a trac ticket suggesting the development of a "new, more robust time standard format and supporting libraries" is in order. (Would you want to see that on the CF site or on an internal Unidata task list?)

Perhaps thats the way to go about this. To repeat, the main problem is the dependence on the udunits library, which is limited in data handling, and is not the right place to "do it right".

Attempting to move towards summary -- here are counter-arguments that have been raised:

  1. Human readability: The limited merits of human readability for data stored in a binary file were already discussed. Extending the human readability argument to an absurd extreme, we might analogously suggest that human readabilty would be improved if floating point numbers were stored as formatted strings, too. The difference between the two is that ncdump knows how to format floating point numbers, but does not currently know how to format dates.

I cant say I find "reductio ad absurdum" arguments helpful. The problem is that you need the udunits library to make time coordinates intelligible. Floating point display is built into every language that exists.

  1. The limits of ISO date strings in scientific applications: ISO date strings were not developed to support scientific data encodings. (Granted that in the current turn of the historical crank they are being imported into the scientific realm and bringing notable benefits.) They are great for communicating a few Gregorian calendar dates. But as noted they break down with the funky calendars that climate modelers use. And they also are not well suited to the multi-dimensional analyses that scientists need to do. Time is fundamentally a linear progression just as distance is, and netCDF applications often achieve significant benefits from having time encoded in the same manner as distance. For example taking a time derivative is exactly the same numerical operation as a distance derivative. (We're seeing some of the same divergence of viewpoints here that play out in the use of Java for modeling applications. Good questions to ask. But bigger and deeper questions than this discussion of ISO string usage.)

Good point. I dont know if Brian's suggestion of ISO 19102 helps with non-Gregorian calendars. The want CHF 146,00 for me to read it. How many CHF in a hectare anyway? ;)

  1. The importance of preserving stability and interoperability (a personal favorite). When the merits of a new, proposed feature are ambiguous, "no" is usually the preferred answer. (modified version of a quote that Russ Rew introduced into the discussions a couple of years ago.)

Unless of course the feature is one's own "personal favorite" ;)


Stepping back from the particulars of this discussion to the maintenance of trac: issues that are opened need eventually to be closed. We haven't yet identified a moderator. (Or have we and I missed it?) So we ought to self-police. Is this one ready to close?

Sorry, Im unclear about we identify a moderator. I guess we are supposed to wait a 2 weeks (?) for any other comments to come in.

Im ok with closing this issues, as proposed, but it will come up again when we get around to "doing it right" with a new library.

comment:11 in reply to: ↑ 10 Changed 14 years ago by jonathan

Replying to caron:

Maybe a trac ticket suggesting the development of a "new, more robust time standard format and supporting libraries" is in order. (Would you want to see that on the CF site or on an internal Unidata task list?)

Perhaps thats the way to go about this. To repeat, the main problem is the dependence on the udunits library, which is limited in data handling, and is not the right place to "do it right".

I agree that the best way to proceed is to improve ncgen/ncdump and other software to encode and decode dates conveniently.

Sorry, Im unclear about we identify a moderator. I guess we are supposed to wait a 2 weeks (?) for any other comments to come in. Im ok with closing this issues, as proposed, but it will come up again when we get around to "doing it right" with a new library.

A moderator is supposed to volunteer (you could try to find one before opening the ticket) and if no-one volunteers then the committee chairman (Karl) should find a moderator. The moderator would close the ticket after having summarised it, but I'm sure it's fine for you to close it yourself if you are happy to pursue the matter in another way.

Best wishes

Jonathan

comment:12 Changed 12 years ago by taylor13

  • Resolution set to wontfix
  • Status changed from new to closed

My understanding is that the difficulty in interpreting time coordinates in units expressed with the "since" and base-date construcion has been at least partially addressed by an ncdump capability that permits the user to request that the time be translated into a normal date, conforming to common conventions. Thus, there is no longer a strong need to do this within CF.

I am therefore closing this ticket. Thanks to everyone who contributed to the discussion and thanks to unidata for adding the new capability to ncdump.

Karl Taylor

Note: See TracTickets for help on using tickets.