⇐ ⇒

[CF-metadata] New standard name: datetime_iso8601

From: Decker, Michael <m.decker>
Date: Wed, 20 Mar 2013 09:53:22 +0000

Dear all,

after reading through the unusally vivid discussion on this issue, I
feel like I have to make a statement as well. My background being the
experience with not-so-CF-compliant "CF-netCDF" datasets submitted to us
by several different groups and the resulting development of a CFchecker
to improve this situation in general.

I think Chris' response gives me a good starting point I can agree with.

1. I also doubt that anyone actually reads the binary netCDF files (or
hexdumps of those files) directly. Tools like ncdump are probably what
people are most often thinking of when talking about "looking at the raw
file/data".
ncdump _is already_ a "client library"! So we _already_ rely on a
computer program (ncdump) decoding the binary data properly for us.
It has also already been pointed out that ncdump can actually convert
(proper) netCDF time encoding to ISO format (-t parameter). So I really
don't see what the argument should be here.
I absolutely disagree with the position against fixing this in
libraries. This is exactly what libraries are for. Maybe this kind of
time en-/decoding has to be implemented in a lower level library (like
the netCDF reader/writer itself) - but in a library nonetheless.

2. Talking about those imaginary data archeologists: The source code to
read and write netCDF data files is open source as far as I know (and
some compilers that can compile it are as well).
Would those archeologists not look into the source code first to figure
out the encoding people used at the time instead of trying to reverse
engineer ISO timestamps from hex dumps or binary files all by themselves?

3. What exactly would the use case of the additional encoding be?
Sure, you can try to ncdump a file to get an idea of its contents. But
why do you do that in the first place? As already pointed out by Chris
this is hardly the "real world" use case because in the end you _will_
have to process your data with some sort of program that can read
(CF-)netCDF.
The real reason I see why you have to look at ncdump output of files
regularly is because the (CF-)Metadata in the file is so bad that you
need to manually figure out what it contains at all. I do not want to
put all the blame on the file creator - CF is still pretty complicated
to understand unfortunately - but incomplete or wrong metadata is a
problem the user has a fair share in.
This in itself is a problem that won't be solved by adding yet more
options for the user to do something wrong - which leads to the next
problem: consistency.

4. Jonathan already pointed out the consistency problem.
Adding redundant information to a dataset is something I consider a
really bad idea. It _will_ lead to inconsistencies in metadata/files.
And what do you do then? Which data do you trust? The netCDF-encoded
version in the time variable or the extra ISO time? In my opinion this
just creates a new problem instead of solving one. The entire goal of CF
(in my opinion) is to make data machine readable (else you could just
stick with ASCII if you want to do everything by hand anyway).

5. Then there are still the encoding problems pointed out by several
people. The big ones I see here are calendards and time zones. Again,
there is a lot a user can (and will) do wrong. In the end there is not
much use in ISO time labels if you can't trust that they are properly
defined anyway.


While I have been writing up this lengthy statement of mine (sorry!)
Martin has also expressed his views on this issue, so I want to say that
I agree on the 2nd point he makes.
It might be useful to preserve some original labels from the data
source. Those could also include ISO timestamps that the instruments
themselves generated. I have absolutely no problem with adding any kind
of such data within the scope of what CF allows (or can be modified to
allow). However, such information should probably not be used or trusted
for automated computations in interoperable systems. What you do with
your own data is of course your own choice - but you don't need CF if you
don't want to share data anyway.

Cheers,
Michael



On 20.03.2013 00:26, Chris Barker - NOAA Federal wrote:
> Richard,
>
> Very well put!
>
>
>> However you choose to peer into your netCDF files you are seeing them
>> through the lens of a "client library".
>
> This is a very good point -- indeed, even if we use a text
> representation of dates, that's really still binary on disk, though
> with a well-known encoding (ascii/unicode, or ??/)
>
> So there is, by definition, no human-readable encoding available!
>
>> But if the library can't do machine-to-human then it probably
>> can't do human-to-machine. In which case there's very little you can
>> actually _do_ with the date/time values (e.g. determine ordering or compute
>> intervals).
>
> Bingo!
>
> Indeed, for "real" use cases, human readability really is worthless --
> most data sets are far too big to do anything useful with the data by
> hand anyway.
>
> John Caron wrote:
>
>> An insidious mistake is to think that problems should be fixed in software libraries.
>
> Fair enough, but I'm not sure we have a "problem" here at all. And
> indeed, it way be just as insidious to think that problems can be
> solved with some nifty new addition to an existing data standard.
>
> 1) I agree with Steve that we aren't in a position at this point to
> decide what the best encoding of datetime is -- rather, we are
> deciding if adding another encoding is a good idea. Unless someone is
> suggesting deprecating the old one.
>
> 2) I'm also not at all sure that string representations are a better
> way to go -- netcdf is primarily for consumption by computer programs
> -- and the existing encoding is a pretty natural fit for that.
>
> 3) I don't see how datetime strings "solve" the calendar problem --
> sure, it's clear what calendar data the provider intended, but if you
> want to know how much time passed between two dates, you're back to
> the same problem (I just noticed that you pointed that out) -- I
> actually think time deltas are often more important that absolute
> times anyway.
>
>> Finish your beer and ill order another round.
>
> I'll get the round after that -- it take a few!
>
> John Graybeal wrote:
>
>> (Note that among those users are people who look at binary dumps of files, of which I am one but I'm sure there are many others.)
>
> binary dumps? in hex? and ISO strings are somehow readable there? huh?
>
> (ncdump, which I use often is not a binary dump, it's ascii dump (does
> it support any other text encoding?), and already handles the
> conversion to iso strings (in recent versions, anyway).
>
> By the way, no objection to the standard name -- none at all.
>
> -Chris
>
>


--
Michael Decker
Forschungszentrum J?lich
Institut f?r Energie- und Klimaforschung - Troposph?re (IEK-8)
E-Mail: m.decker at fz-juelich.de
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20130320/63759c2a/attachment-0001.html>
Received on Wed Mar 20 2013 - 03:53:22 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒