⇐ ⇒

[CF-metadata] Proposal for better handling vector quantities in CF

From: Steve Hankin <Steven.C.Hankin>
Date: Tue, 13 Dec 2011 10:34:24 -0800

John, Jon, Thomas, et. al.,

I will weigh in here with a vote _*against*_ creating another dimension
( a new axis type) to achieve vector component . In higher level code
creating a multi-dimensional vector object may well be an elegant
approach -- but I will argue in bullets below that at the file
definition level it is can add complexity and create a number of
significant inconsistencies in the code pipelines and backwards
compatibility problems.

 1. There will always be two classes of data access need for vectors --
    1) looking at the individual components; 2) looking at the
    multi-component vector quantity. Accessing individual components is
    *very common* (I'd speculate that it may be the more common of the
    two modes.) If we group the components into a single variable
    using an additional dimension it means that the code to treat the
    individual vector components will become different from the scalar
    variable code, despite the fact that there is a nearly identical
    list of use cases for vector components and scalar variables. (This
    would be a step away from elegance.)
 2. We would almost certainly find that staggered grids becomes a
    slippery slope of complexity. The specific index ranges needed for
    the individual staggered components depend on the operation that is
    being performed: vector plots, curl, divergence, volume integrals,
    etc. ... These needs are not consistent with a single index range
    applying to all components.
 3. There are many use cases in which the analysis pipeline is different
    for different components of a vector. Some examples: the vectors
    may be stored in separate files (e.g. the entire CMIP5 archive ...
    and we know what a challenge it is to get data providers to utilize
    the aggregation tools); the Z vector component of ocean data is
    often generated through an on-the-fly analysis conservation-of-mass
    anlaysis step, rather than stored in the file; the Z component often
    requires special scaling -- e.g. when making vector plots. Such
    cases illustrate why it is more elegant to make the vector
    associations in higher level code, rather than at the file level.
 4. 3-vector components are often plotted and analyzed in 2-dimensional
    views. With a vector dimension of length 3, we cannot do a
    multi-dimensional access in the XZ plane without reading the Y
    component, too -- illustrating where the vector dimension at the
    file level can add complexity.

     - Steve

=======================

On 12/9/2011 11:43 AM, John Caron wrote:
> On 12/9/2011 11:37 AM, Jonathan Gregory wrote:
>> Dear John
>>
>> I prefer the idea that Thomas has put forward of an umbrella, rather
>> than
>> containing the vector/tensor components in one data variable, because
>>
>> * I really don't like the idea of units being mixed within a data
>> variable.
>> I think things with different units must certainly be different
>> quantities
>> and could not be regarded as the same field. You can get away with it
>> if they
>> are all m s-1, for instance, but not so easily if the vertical one is
>> orders
>> of magnitude different from the horizontal, and not at all if the
>> vector is
>> expressed in polar coordinates.
>
> I think the common case is that the vector components have the same
> unit. One could restrict to that case.
>
>>
>> * I think it would be very inconvenient, and would break a lot of
>> existing
>> software, if the coordinates were not what they appeared to be,
>> because an
>> offset had to be added. Also, in general, the component fields of a
>> staggered
>> grid do not have the same dimensions, as well as differing in the
>> coordinates.
> Im not sure what "an offset had to be added" means.
>
> I think the common case of staggered grids could be handled with a
> convention defining the staggering, rather than seperate dimensions. I
> pull out the one Rich Signell and I cam up with a long time ago, for
> its own sake.
>
>>
>> * It avoids the need to define a convention for labelling vector/tensor
>> components.
> I think this convention would be about as complex as the one you will
> need for Thomas' proposal.
>
>>
>> * It is completely backwards-compatible as regards the component
>> fields, which
>> are exactly as before; we're just adding some separate information
>> linking
>> them. This seems neat to me.
>
> I agree thats a strong reason for Thomas' method.
>
> OTOH, if we start thinking in terms of the extended model, a Structure
> ("compound type" in HDF5 parlance) might be useful. What do you think
> about starting to think about possible uses of extended data model?
>
> Thanks for your thoughts, as always, interesting.
>
> John
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20111213/19a17de2/attachment-0001.html>
Received on Tue Dec 13 2011 - 11:34:24 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒