Jonathan,
I think we are getting closer to each other, bit by bit (yay!).
My responses are interspersed below.
On 4/29/15 9:39 AM, Jonathan Gregory wrote:
> Dear Jim, Chris et al.
>
> I'm using the word "calendar" in a CF-consistent way, I believe. Maybe it's not
> the best word for the concept, but nonetheless we have an attribute called
> "calendar", and its sole function is indicate which algorithm is used to
> translate between components of time (YMDhms) and elapsed time (in units of
> time since a reference time). So perhaps that is consistent with "calendar"
> being a collection of algorithms, in Chris's text, but it's more specific
> than that. It has a particular function in the interpretation of CF time
> values (usually coordinates). CF sect 4.4.1 says
> "In order to calculate a new date and time given a base date, base time and a
> time increment one must know what calendar to use."
> and I think that is the sense in which I am using "calendar".
I agree that this is a CF-consistent usage of the word calendar, but it
runs against natural usage, and I think it's worth keeping that in mind.
>> In the "real" world, we often start with UTC timestamps that have
>> leap seconds accounted for, yet convert them to elapsed times using
>> calculators that don't account for leap seconds. This can actually
>> lead to elapsed time values that encode a time discontinuity and
>> cannot be counted on to produce accurate differences between every
>> pair of values.
> This is a problem, I agree. We should avoid that problem for future data by
> making the conventions more precise about which calculator should be used
> (which calendar, in the CF sense). We can't decide for sure what was done when
> encoding past data, but the conventions string records the version of CF used.
>
Calendars and calculators are different things. A calendar or time
system is (per Chris) a collection of algorithms. I started using
the term calculator because it was shorter and more generic than
"time handling module or library". A calculator is a particular
implementation of a set of algorithms. This is why I ended up
avoiding the use of the word clock as a name for the group of
algorithms that make up a time system. A clock is a device that is
an implementation of a set of algorithms.
I agree that we should make the documentation more precise, and warn
people of the potential pitfalls of using a calculator that doesn't
recognize leap seconds to create time variables from timestamps that
include leap seconds (like UTC-based timestamps).
>> I'm suggesting that we need to do two things. One is to more precisely
>> define what sorts of times can be used in the time reference part of
>> the units attribute. I just reread section 4.4, and it actually says
>> that the time is UTC or a time zone offset from it. I think it
>> should stay that way and the wording strengthened to make it
>> clearer.
> Yes, it does say that. It's a quote from the udunits man page. However I don't
> think the issue of leap seconds has been carefully considered before, so we
> don't have to assume that's what it meant exactly, especially as udunits does
> not support lead seconds. As previously said, and I think you may agree, it is
> likely that nearly all existing time values have been encoded *without* leap
> seconds, and therefore *not* UTC strictly. Therefore my alternative suggestion
> is that we should add some text here that says we don't necessarily imply
> leap seconds are included by mentioning UTC. This must be the case, because
> the same format of time unit is used for calendars that definitely do not ever
> include leap seconds i.e. all the non-real-world ones. UTC is mentioned simply
> as a way to refer to the time-zone which contains the Greenwich meridian,
> without summer time.
>
I agree that leap seconds haven't been carefully considered before. I
disagree that nearly all existing time values have been encoded without
leap seconds. I'd say that nearly all existing time values that were
derived from true UTC timestamps are at risk of having leap second
discontinuities encoded into the set of values. (See my previous
response below.)
There are three issues here, so let's not conflate them. They are:
1. What to call the time system that is like UTC in overall form
(Greenwich meridian, etc) but doesn't include leap seconds.
2. How to indicate which actual time system is being used for the time
part of the reference time in the units attribute.
3. How to indicate whether or not the elapsed times in the time
variable are certain to be free of leap second induced discontinuities.
>> The other thing I think we need to do is provide a way to indicate
>> that the elapsed times in a time variable are true elapsed times
>> that are certain to be free of leap second discontinuities, or are
>> possibly contaminated with leap second discontinuities. In
>> connection with this we would need to add clarifying language in the
>> CF conventions to educate people on the importance of using time
>> calculators that are aware of leap seconds when moving between UTC
>> timestamps and elapsed time values. This could be handled by adding
>> a modifier to a calendar name in the calendar attribute, or it could
>> be handled by adding a new attribute to hold this information. I
>> think that coming up with one or more new calendar names is a more
>> confusing and less useful way to accomplish this.
> I don't think we should define a new attribute, because this distinction is
> one which applies only to the real-world calendar. It's therefore more robust
> and simpler to indicate it as a modifier to the name when applicable in the
> the calendar att, so making a new calendar name, in effect. But given this
> discussion I agree that calling it just "UTC" may not be clear enough. The
> real-world calendar is called "standard" or "gregorian". I would propose a new
> possibility "proleptic_gregorian_utc", meaning the proleptic Gregorian
> calendar with leap seconds inserted as applicable since 1958.
Adding a calendar such as "gregorian_utc" and redefining the definition
of "gregorian" is insufficient to address all of the issues. As far as
calendars and time systems go, we have two options:
* <whatever calendar> without leap seconds
* <whatever calendar> with UTC leap seconds
By the way, here's a pretty exhaustive discussion of all the different
time systems.
http://www.ucolick.org/~sla/leapsecs/timescales.html
The best option I have come up with for naming the time system without
leap seconds is "traditional". There doesn't seem to be a name for the
"60 seconds in a minute, 60 minutes in an hour, 24 hours in a day" time
system. The word traditional is used by the Bureau International des
Poids et Mesures in their description of the non-SI units of minutes,
hours, and days that are accepted for use with SI units.
So we could answer issue 1. by calling the time system that's like UTC
but without leap seconds the "traditional" time system. We could also
use the name "POSIX", and that is kind of evocative since it reflects
the potential for glitches, but POSIX also defines an epoch date & time
which this time system lacks. I'd be good either way though.
We could answer issue 2. by defining the existing calendars as including
the "traditional" time system and adding more calendars with "_utc"
tacked on to original names, but this glosses over the fact that we
actually can't be sure what we have in existing datasets. We could
instead revise the definition of the "calendar" attribute to include an
optional space-separated modifier that would name the time system used
for the time part of the reference date & time. So instead of adding new
calendars, we would allow values of the calendar attribute to be
calendar = "<calendar> [<time system>]"
where the '<>' indicate placeholders for the things named within them,
and the '[]' indicate optional inclusion. The values for the <time
system> modifer would be
* "unknown" - This is the default if no time system modifier is
specified. Users should be aware that the reference time in the
units attribute may or may not be based on a system (such as UTC)
that includes leap seconds.
* "traditional" - This indicates that the reference time in the units
attribute is based on a traditional "60 seconds in a minute, 60
minutes in an hour, 24 hours in a day, base time zone at the
Greenwich meridian" time system. No leap seconds are included in
this time system. (Like I said, I'd be OK with calling this "posix"
instead.)
* "utc" - This indicates that the reference time in the units
attribute is based on Coordinated Universal Time (UTC), which
includes adjustments and leap seconds after 1958.
So, you could have calendar attribute combinations such as "gregorian
utc" or "gregorian traditional", and "proleptic_gregorian" would
indicate that there is uncertainty (which there is for all existing data
sets) as to which time system was used. You could define the modifier as
only applying to the "real world" calendars to prevent weird
combinations such as "noleap utc". I think that this is a better way to
handle this within the calendar attribute.
That leaves issue 3. The solutions to the first two issues don't address
this issue. This is where the difference between a time system or
calendar and a calculator come up.
We could add a further modifier in the calendar attribute with three
possible values:
* "true_elapsed" - the elapsed time values are certain to be free of
leap second discontinuities
* "false_elapsed" - the elapsed time values will include one or more
leap second discontinuities if any leap second application dates
were included in the time span from the reference time in the units
attribute to any of the time values.
* "unknown" - the elapsed time values might include leap second
discontinuities, but it is not known if they do or not. (the default)
This extends the calendar attribute definition to be
calendar = "<calendar> [<time system>] [<elapsed time type>]"
We could do it this way. I tend to think that the last one might be
better handled as a separate attribute, but I'd be OK with this approach.
Grace and peace,
Jim
> Best wishes
>
> Jonathan
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA?s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: jbiard at cicsnc.org <mailto:jbiard at cicsnc.org>
o: +1 828 271 4900
/We will be updating our social media soon. Follow our current Facebook
(NOAA National Climatic Data Center
<https://www.facebook.com/NOAANationalClimaticDataCenter> and NOAA
National Oceanographic Data Center <https://www.facebook.com/noaa.nodc>)
and Twitter (_at_NOAANCDC <https://twitter.com/NOAANCDC> and @NOAAOceanData
<https://twitter.com/NOAAOceanData>) accounts for the latest information./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20150429/295cfdac/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CicsLogoTiny.png
Type: image/png
Size: 15784 bytes
Desc: not available
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20150429/295cfdac/attachment-0001.png>
Received on Wed Apr 29 2015 - 11:12:04 BST