⇐ ⇒

[CF-metadata] FW: daily maximum of running 8-hour means

From: Hedley, Mark <mark.hedley>
Date: Fri, 28 Oct 2011 15:10:21 +0100

Hello

I think the approach:

> to extend CF to allow you to preserve the original time axis to which the first operation applied, even though it is no longer provides the time coordinates once the second operation has been carried out.

is of particular interest. I do not agree that I 'prefer the second alternative'.

This construct, optionally storing the original coordinate values after an operation is easily machine interpretable, with all the reference information required. It offers a way of consistently approaching the definition of a process which is widely applicable in a very interesting manner.

It offers options to the data creator, who may very well decide that such metadata is surplus to requirements and is easily summarised using the current cell_method syntax. However where this syntax does not capture the essence of the process to the creator's requirements, a trade off of a larger metadata payload for enhanced clarity of phenomenon definition may well be deemed worthwhile.

By containing the referencing within the cell_method there is one point of contact, so the data consumer can be confident they have interpreted the essence of the data creator's definition.

I would like to develop some use cases which would benefit from this approach and put them forward for discussion before the extended syntax is proposed as a trac ticket.

As soon as I have these I will post them to the mailing list.

all the best
mark


-----Original Message-----
From: cf-metadata-bounces at cgd.ucar.edu on behalf of Jonathan Gregory
Sent: Fri 28/10/2011 03:06
To: Schultz, Martin
Cc: cf-metadata at cgd.ucar.edu
Subject: Re: [CF-metadata] daily maximum of running 8-hour means
 
Dear Martin

I think we all prefer the second alternative then i.e. the summary of what
was done, rather than explicit coordinates.

> Upon calculating this "index", one will generally assign the daily value to one single time "coordinate" value representing the respective day. Now, this would mean that the "offset" value will likely be undefined if I understood this correctly.

I suppose so. I think the "offset" and "period" would both be optional anyway.
It sounds like you really only need to record the "period" i.e. 8 hours, and
I suggested the "offset" too for completeness, following Philip's email. It
would be sufficient to put
"time: mean (period: 8 hours) time: maximum (interval: 1 hour)".
to indicate a (daily) maximum made from 8-hour running means if you aren't
concerned exactly how the 8-hourly periods correspond to the days. In that
case we can simplify the proposal by omitting "offset" until someone needs it.

If no-one has a better idea, shall we propose this extended syntax in a trac
ticket?

Cheers

Jonathan

_______________________________________________
-----Original Message-----
From: cf-metadata-bounces at cgd.ucar.edu on behalf of Schultz, Martin
Sent: Thu 27/10/2011 17:47
To: cf-metadata at cgd.ucar.edu
Subject: Re: [CF-metadata] daily maximum of running 8-hour means
 
Dear Jonathan, Philip,

     I tend to lean towards the second alternative. Partly, because I believe it is easier to understand, and partly, because I think that in most cases the extra information which cell_bounds entered the "daily max. 8-hour running mean" is not preserved (at least I never saw it in any data set). Upon calculating this "index", one will generally assign the daily value to one single time "coordinate" value representing the respective day. Now, this would mean that the "offset" value will likely be undefined if I understood this correctly.

Best regards,

Martin

-----Original Message-----
From: cf-metadata-bounces at cgd.ucar.edu on behalf of Jonathan Gregory
Sent: Mon 24/10/2011 01:29
To: cf-metadata at cgd.ucar.edu
Subject: Re: [CF-metadata] daily maximum of running 8-hour means
 
Dear Martin and Philip

Thank you for your use-cases showing the need for standardised metadata for
daily maxima of 8-hour running means. I expect that similar cases will arise
in other areas and actually it is quite surprising they haven't already, I
think.

What you want to describe involves two cell_methods processing operations: (1)
Calculate an 8-hour running mean for consecutive hours, (2) Calculate a
maximum of these within days. We could record this explicitly and completely
if we had different time coordinates for the two operations. For the first
operation, the cell_method is a mean, and we have time coordinates at hourly
intervals, each with bounds specifying an 8-hour period. The cell bounds
overlap, but that's no problem. E.g. they could be [0:00, 8:00], [1:00, 9:00],
[2:00, 10:00], ... (just the time, omitting the date). For the second
operation, the cell_method is a maximum, the bounds indicate periods of days,
spaced at daily intervals i.e. not overlapping.

One possibility would be to extend CF to allow you to preserve the original
time axis to which the first operation applied, even though it is no longer
provides the time coordinates once the second operation has been carried
out. For instance, we could allow:

  float air_quality(time);
    air_quality:cell_methods="old_time: mean time: maximum";
  double old_time;
    old_time:units="hours since 2011-10-23 0:00";
    old_time:bounds="old_time_bounds";
  double old_time_bounds(old_time,two);
  double time(time);
    time:units="hours since 2011-10-23 0:00";
    time:bounds="time_bounds";
  double time_bounds(time,two);

  old_time=0.5, 1.5, 2.5, ...;
  old_time_bounds=-3.5,4.5, -2.5,5,5, -1.5,6.5, ...;
  time=12, 36, 60, ...
  time_bounds=0,24, 24,48, 48,72, ...;

This is not currently legal because old_time is not a dimension of air_quality,
but it seems a reasonable extension to me.

For climatological time, we have a similar issue of multiple time processing,
and in that case we do not find it necessary to keep old time coordinates.
Instead, we use a special interpretation for the time bounds (CF sect 7.4).
However, this doesn't seem to be easy to adapt for the general case, because
of the overlapping periods. Climatological time doesn't allow that.

Another possibility would be to define new kinds of standardised metadata,
following Philip's categories, to describe the superseded time coordinate e.g.
cell_methods="time: mean (period: 8 hours offset: -15.5 hours) time: maximum
(interval: 1 hour)". The second entry is standard, and records that hourly
values were input to the calculation of the daily maximum. The first entry
records that those values were themselves means calculated over periods of 8
hours, and the first such period began 15.5 hours before the first time
coordinate (12 - 15.5 = -3.5, the lower bound of the first old_time cell).

I am sure there are plenty of other possibilities, but I wonder what you think
of either of these as a start?

Best wishes

Jonathan
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Fri Oct 28 2011 - 08:10:21 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒