Dear Jonathan, all,
I think that a small extension to allow for vague notions of
"representativity" could be valuable, for cases in which "mean",
"median" etc would imply spurious precision. Perhaps there could be a
standard way to point to a human-readable document describing the notion
in more detail for that particular dataset. The alternative to this
vague "representativeness" would probably be a full machine-readable
description of how the value was arrived at, which would be very complex
in many cases (although desirable if it could be achieved!)
The specific software problem is that CF represents temporal data using
a syntax like "seconds since 1970". Time values are therefore
double-precision numbers, with unknown real-world precision. Certainly
it is an improvement to be able to say that the value in question is
"representative" of the time range "t1 to t2 seconds since whenever".
However, even this can imply spurious precision as Ken Casey has
explained on this thread.
An alternative might be to specify partial dates and times as ISO8601
strings, e.g. a time axis representing data that are representative of
particular days could read "2010-02-01, 2010-02-02" etc. (This is
somewhat related to ticket 14, which we decided against implementing:
https://cf-pcmdi.llnl.gov/trac/ticket/14.)
By the way, we also have a use case to represent some palaeoclimate
data, which contains timeseries of data that are representative of
months in the past: any more precision than this would be misleading.
Such a time axis could be represented as "1200-01, 1200-02" etc.
Cheers, Jon
-----Original Message-----
From: Jonathan Gregory [mailto:jonathan at met.reading.ac.uk] On Behalf Of
Jonathan Gregory
Sent: 02 June 2010 16:10
To: Jon Blower
Cc: cf-metadata at cgd.ucar.edu
Subject: Re: [CF-metadata] bounds/precision for time axis
Dear Jon
CF doesn't provide a way to do this except by giving bounds. I think
that's
the right thing to do, because the length of the interval alone doesn't
say
when it starts and stops, which applications may need to know.
The cell_methods indicates how the value represents the variation within
the
interval. For an intensive quantity, "point" is the default i.e.
instantaneous
in time. To indicate a mean, cell_methods of "mean" should be specified.
You
are saying it is "representative" in some vaguer way than a mean, and it
is not
instantaneous. That sounds like a different cell_methods. Perhaps it
would be
a good idea to allow "cell" to be specified in cell_methods for
intensive
quantities, to indicate a "representative" value in this vague sense.
("cell"
is the default cell_methods for an extensive quantity, which relates to
the
entire cell and depends on its size.) I think this vagueness should in
general
be discouraged; it would be better to be more precise and specify
"mean",
"median" etc., but if you can't be precise it'd be nice to be able to
say so.
What do you think? That would require a small change to the convention.
Cheers
Jonathan
> We have many datasets for which we need to express the precision of
the
> time axis. For example, the OSTIA sea surface temperature dataset
> contains daily fields. The data are considered "representative" of a
> particular day, without necessarily being a simple average over the
day.
> At the moment the data are registered to 12:00Z on each day, but this
is
> indistinguishable from an instantaneous snapshot at this time.
>
> I guess it would be possible to express the temporal precision using
the
> "bounds" attribute for the variable in question
>
(
http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.4/cf-conventions.ht
> ml#cell-boundaries), by specifying the start and end of each day as
the
> bounds. Is there a less verbose way of providing this information,
> perhaps by stating the precision as "1 day/24 hours/whatever" as a
> single attribute?
>
> Jon
>
> --
> Dr Jon Blower
Received on Wed Jun 02 2010 - 11:35:18 BST