Dear fellow CF'ers,
Concerning Jonathan's final issue ("abstract variable") .... I would
like the capability to store grid information (in particular the coordinates and
cell bounds) in a file separate from the data itself. Although
this breaks the self-describing nature of individual files, there has been
strong encouragement from the OCMIP people (Ocean Carbon Model Intercomparison
Project) to allow this because it saves them an awful lot of storage. If
for a single 2-d field you have a time-series with each time-slice
stored in a separate file, then for irregular grids, you can reduce
the size of each file by a factor of about 7 if you "factor out" the coordinate
information (because for each grid cell you must store 2 coordinate values and
4 bounds values, along with the variable itself).
I know that OCMIP has already separated data from coordinate information, and
I suspect others will want to do this (for practical reasons). Why not give
them some guidance on how it should be done?
I think Bob Drach has some additional views on how this would also make it easier
to exchange data through the ESG.
If this is relevant to Jonathan's "abstract variable" concept, maybe
we should consider it at this time.
Karl
Jonathan Gregory wrote:
>
> Dear All
>
> Three schemes are being discussed:
>
> * CF-beta. A per-data-variable attribute which contains the names of the
> mapping, the parameters and their values.
>
> * Caron-Rew. A string variable naming the mapping, with special attributes to
> define its parameters, referred to by name from a per-data-variable attribute.
>
> * Gregory. A per-data-variable attribute which contains the names of the
> mapping, the parameters and the variables containing the parameter values. This
> avoids the awkwardness that Bob raised of having to get numerical values from
> strings.
>
> Readability and telling differences by inspection: It is clear from the
> discussion that these issues are matters of personal preference, since we
> disagree on what's convenient for a human (I believe we are all human). Hence
> we can't use these factors as a basis for decision.
>
> Simplicity: My concern about lots of new attributes is the same as Karl's -
> it makes the standard more complicated. For the same reason I do not like
> introducing a new "kind" of variable for the sake of something to attach these
> attributes to.
>
> The main area of disagreement is the desirability of factoring out duplicated
> information. There are several aspects to this.
>
> (a) Testing whether variables have the same grid mapping. In the Caron-Rew
> scheme you can conclude they do have the same mapping if they have the same
> mapping name. But there might be two mappings with identical definitions. This
> could happen if you assemble a file from several files. It could happen for
> example if a model unconditionally creates separate mapping variables for the
> velocity and mass grids. To prevent this happening you would have to place a
> requirement on the data-writer always to check whether mappings are the same
> and eliminate duplicates whenever a file is created. If you cannot guarantee
> that mappings are unique, you have to test whether they are the same by
> comparing parameters. If you *ever* have to be able to do this, you need
> software to do it, and I argue that this advantage of factoring out the mapping
> has then essentially been lost. You may as well always compare the mappings
> parameter by parameter. It's more reliable that way.
>
> (b) Order of parameters in a single attribute. We disagree about how easy this
> is for a human to deal with, but for a program it is surely not an issue. A
> program can scan the attribute repeatedly to find the parameters, regardless of
> their order. The parser need not be bothered by the introduction of new
> parameters it doesn't recognise - it can just ignore them by skipping forward
> to the next keyword.
>
> (c) Potential inconsistency if each variable has its own mapping definition.
> In my scheme, this is partly avoided, because the different data variables can
> share the parameter variables. However, I really would appreciate an example
> which illustrates the concern with potential inconsistency. I don't think that
> a per-data-variable attribute *is* truly duplicating information. Each variable
> has its *own* grid mapping, though they may all happen to be the same. In the
> Caron-Rew scheme and in my scheme you can change the value of a mapping
> parameter for all data variables which share a mapping by altering a single
> number. In the Caron-Rew scheme you can change the mapping itself (e.g. from
> rotated pole to polar stereographic). But in what situation would you actually
> want to do this by a global operation (except to correct a mistake)? Changing
> the mapping means entirely recomputing the variable; it must have a new grid
> and new data values. You have to do this on each variable separately. It is not
> a global operation.
>
> Because of (a-c), I don't think that factoring out the information gives any
> great advantage, but I do think that to do it this way involves significantly
> more work for the data-writer and is more complicated for the user of the
> standard. So I still prefer CF-beta or my modified form of it.
>
> Apart from these, there is a final, completely different issue which Russ
> raises under "deletion anomalies", that the mapping can't exist if no variables
> use it, in the case of a per-data-variable approach. Brian has raised a similar
> issue in the past regarding auxiliary coordinate variables, which look just
> like data variables if they are by themselves in a file. I feel this is a
> different thing altogether. Do we want to be able to have "grids" (whatever we
> might mean by that :-)) which exist independently of data variables? This is a
> philosophical design point. CF has not tried to accommodate that idea up to
> now. The aim of the standard is to provide metadata for data which exists, not
> to provide metadata which could describe some data not yet provided.
>
> I would say that this is a more general than grid mappings. If we want to do
> this, I would propose we introduce something like the Caron-Rew idea, but for
> all grid information. I'd call this an "abstract variable".
>
> dimensions:
> x=73;
> y=96;
> pressure=10;
> p_len=50;
> nb=2;
> nv=4;
> variables:
> char abstractvariable(p_len);
> abstractvariable:grid_mapping="rotated_latitude_longitude ",
> "grid_north_pole_latitude: nplat grid_north_pole_longitude: nplon";
> abstractvariable:coordinates="lat lon";
> float x(x);
> x:bounds="x_bounds";
> float x_bounds(x,nb);
> float y(y);
> float lat(y,x);
> lat:bounds="lat_bounds";
> float lat_bounds(y,x,nv);
> float lon(y,x);
> data:
> abstractvariable="x y pressure";
> nplon=170.0;
> nplat=32.5;
> lat=...;
> lon=...;
> x=...;
> y=...;
>
> This is an entirely data-free variable. Its string value tells us its
> dimensions, and it provides a home for the mapping, coordinates, auxiliary
> coordinates and their boundaries. Its name could be taken as a name for this
> grid as a whole. However, I'm not advocating this as a solution to the
> practical problem of how to define the grid_mapping, but as a proposal to
> consider if we want to introduce this more abstract kind of object as an
> additional feature.
>
> Have a good weekend. Cheers
>
> Jonathan
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Mon Feb 24 2003 - 13:07:19 GMT