⇐ ⇒

[CF-metadata] Storing multiple NWP model runs in a NetCDF - CFfile [SEC=UNCLASSIFIED]

From: Timothy Hume <T.Hume>
Date: Mon, 12 Jan 2009 09:39:00 +1100

Hi,

It strikes me that the issue of more than one time coordinate is just a special case of a more general issue.

There is a spatial analogy. Normally it makes sense to store NWP or climate model data on two spatial dimensions (e.g. latitude and longitude). This is because most model output has all the combinations of latitudes and longitudes. However, when it comes to storing data from a network of weather stations (for example), it makes sense to collapse the two spatial dimensions into a single "station" dimension, and have latitude and longitude as auxilliary coordinate variables. If the two spatial dimensions were not collapsed, one would end up with very large files mostly full of missing values.

Therefore, maybe the solution to the two time coordinate problem is to have a standard method for collapsing two or more dimensions into a single dimension. Then users could apply the collapsing method to any combination of dimensions they choose. Normally it would only be desirable to collapse multiple time dimensions to a single dimension, or latitude and longitude to a single "station" dimension. However, someone might come up with a data set where it made sense to collapse latitude and time into a single dimension (I can't even imagine why one would want to do this, but you never know ...)

Maybe the collapsing method could work like this:

1) Start with two (or more)dimensions and coordinate variables which need to be collapsed

dimensions:
        dim1 = n1;
        dim2 = n2;
variables:
        double dim1(dim1);
        double dim2(dim2);
        double data(dim1, dim2);

2) Create a new dimension d3 which has auxiliary coordinate variables called dim1 and dim2:

dimensions:
        dim3 = n1*n2;
variables:
        double dim1(dim3);
        double dim2(dim3);
        double data(dim3);

3) There could be an optional recommendation concerning the order in which the dimensions were collapsed. For example, if dim1(dim1) had values of 1, 2 and 3 and dim2(dim2) had values of 4, 5 and 6; then once collapsed:
        dim1(dim3) = 1, 1, 1, 2, 2, 2, 3, 3, 3
        dim2(dim3) = 4, 5, 6, 4, 5, 6, 4, 5, 6
A recommendation such as this would potentially make mapping of data to arrays in programming languages easier.

Returning to the original issue of multiple time coordinates. There are some good practical reasons for my application to use two separate dimensions (forecast_reference_time and forecast_period) rather than collapsing them into a single dimension:

1) For NWP applications, it is quite common to have all the combinations of forecast_reference_time and forecast_period. Therefore using two dimensions won't waste space with missing values. For some model data I store (e.g. some JMA data), the model runs at 12Z have more forecast_periods available than the model runs at 00Z. In these cases, collapsing the time coordinates into a single dimension would save disc space at the expense of making the files more difficult to use. I prefer to take the disc space penalty and keep the two dimensions (and fill all the long range forecasts from the 00Z run with missing values).

2) Keeping two separate dimensions makes it trivial to perform some common operations on subsets of the data. For example, I can use the NetCDF operators to easily compute the average bias (with respect to analyses) of 48 hour forecasts during the last thirty days.

3) Keeping forecast_period as a separate dimension, and utilising the CF cell methods (on the forecast_period dimension) provides a nice way to store accumulations (a.g. accumulated precipitation). Most models include accumulations since the analysis or reference time, but some reset accumulation totals every 24 hours. Either way, the use of a separate forecast_period and the CF cell methods can deal with these data. All that is necessary is to set the forecast_period_bounds variable to the start and end times of the accumulations. I have a feeling that storing accumulations would be more difficult if I collapsed the time coordinates onto a single dimension (but I may be wrong).

Cheers,

Tim Hume
Centre for Australian Weather and Climate Research
Australian Bureau of Meteorology
Melbourne
Australia

> -----------------------------
>
> the main elements of any solution are the 3 types of time
> coordinates, with standard names:
>
> forecast_reference_time
> forecast_period
> time
>
> such that forecast_reference_time + forecast_period = time.
>
> there are 3 main difficulties to answer:
>
> 1. in the general case, forecast_period may be 2D (not
> common though), and time is often 2D. is this allowable as
> auxiliary coordinates?
>
> 2. there may be missing cases, akin to ragged arrays.
> possible solutions are an "index dimension" (section 8.2), or
> to use missing values, or ?. (I notice that existing examples
> I have seen find a way to avoid this)
>
> 3. solutions that try to stick with only 1D variables will,
> in general, have repeated coordinate values, making them
> technically not coordinate variables. will we allow this?
Received on Sun Jan 11 2009 - 15:39:00 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒