⇐ ⇒

[CF-metadata] 2. Re: [cf-satellite] Sharing quality flags among multiple variables (Jonathan Gregory)

From: Schultz, Martin <m.schultz>
Date: Mon, 21 Nov 2011 11:48:34 +0100

Dear Jonathan,

      while it makes sense what you say, the lines are somewhat blurred and this is the philosophical fabric which makes it sometimes hard to communicate the usefulness of CF to others. It may be about time to begin thinking about CF-2.0 and initiate a discussion which should have simplicity as one major goal. There is to my knowledge (and there should be) no rush in this, but it may be worthwhile to begin to think about the future. Well, I am sure you have done this already! But what is the user involvement in this process? Should we think about a "CF conference", or maybe a somewhat larger scope "metadata conference"?

    But to get specific again: to me there is not much difference if you count (valid) observations over a given time interval or if you calculate a percentile or mean value. All of these operations aggregate information from the variable over time. This also means you always loose some detail/information. But where do you begin and end with this? A typical air quality measurement may be done every minute. What is archived are often the hourly data which have already been processed and averaged, and you will (hopefully) find at least some information about this in the metadata, at least a simple data quality flag that says if a given hourly value is ok or not. Then you can process, for example monthly mean values for one given year or monthly mean values over a "climatology" period (i.e. all "January" values from 1980 to present). Parallel to these mean values you may want to know the number of the obs entering this mean value, and the percentiles or the standard deviation. Then you create a regional mean va
lue where you combine data from different stations in a certain geographical domain. Again, all of these operations aggregate data and eliminate or reduce at least one dimension. In my view the "modifier" case where an observation count applies to several variables is just a special case, where you actually have various obs coming from the same instrument. In general, even if instruments are operated in parallel you will encounter failure of one measurement at other times than failure of another measurement. So, the general case is that obs are independent of each other. Therefore, I would argue that the "synchronizing" of obs is a special thing and should be treated separately from the statistical treatment of variables. Hence, the default in the sea water temperature and salinity case would be that each variable has its own "count" (via cell_methods?). One could then define a way to create this link via some sort of cross-referencing.

Cheers,

Martin


> -----Original Message-----
> From: Jonathan Gregory [mailto:j.m.gregory at reading.ac.uk]
> Sent: Friday, November 18, 2011 4:43 PM
> To: Schultz, Martin
> Cc: cf-metadata at cgd.ucar.edu
> Subject: Re: [CF-metadata] 2. Re: [cf-satellite] Sharing quality flags among
> multiple variables (Jonathan Gregory)
>
> Dear Martin
>
> > what is the difference between a mean value and an observation
> > count? You may add the 25th percentile to this list as well. As far as
> > I can tell, the cell_methods attribute should be best suited for all
> > of these and I don't see a need to work with standard_name modifiers
>
> Though this has not been thoroughly debated, I think the reasons why there
> are these two different mechanisms are that the two functions are
> distinguished like this:
>
> * cell_methods represents subgrid variation. They always imply that the data
> variable formerly had a higher dimensionality or a higher resolution, and they
> refer to one or more dimensions of the data on which the reduction or
> collapse was done. The relationships indicated by standard_name modifiers
> do not refer to particular dimensions of the data.
>
> * The operations cell_methods records are done on the data in the variable
> itself. Ancillary variables, described by standard_name modifiers, are extra
> information about the data in the variable. This cannot be inferred from the
> data; they are metadata, really, not a statistical reduction of data.
>
> However, I agree there's a similarity. In particular, both of them were
> motivated by a desire to avoid proliferation of standard_names because of
> the need to describe very common operations that could be applied to
> anything, and both of them could modify the units.
>
> Best wishes
>
> Jonathan

------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Received on Mon Nov 21 2011 - 03:48:34 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒