Hi Stephen et al.,
I have rediscovered this conversation in the depths of my inbox and
thought it might be worth resurrecting. To summarize the discussion
so far:
1) I proposed two new standard optional CF attributes (called
something like minimum_data_value and maximum_data_value) that would
be attached to a variable in a NetCDF file, redundantly holding its
actual min and max values in that file. As well as helping with data
mining applications, this would act as a hint to visualization
packages to provide a first attempt at automatically defining a colour
scale range for sensible portrayal. Note that this attribute pair is
distinct in purpose from valid_min and valid_max (which contains
theoretical extrema, beyond which data is considered invalid).
2) Steve Hankin proposed a more sophisticated approach whereby there
would be a min/max pair for each vertical level in the data.
3) Stephen Pascoe pointed out that spatial subsets will have different
min and max values so the value of the simplistic approach is limited.
Clearly both (2) and (3) are very valid points. But I still feel that
in a "bang for buck" approach, the simplest approach (1) still has
benefits. Even the simplest approach gives a sensible (if imperfect)
range of values, without which a visualization application has no idea
how to generate a colour scale, without extracting potentially-large
quantities of data. Furthermore it is very easy for an NcML
aggregation to look these min/max attributes in each file and generate
the min/max for the aggregated dataset. I mentioned visualization as
the primary use case for this proposal, but data mining apps can
benefit too: it would be much quicker and easier to answer questions
like, "which files contain temperatures above 30 degC"?
Stephen mentioned cell_methods, which I'm not expert in. If this
approach is to be adopted, do people think it would be best to express
the min/max as two attributes or in cell_methods?
Should this discussion be moved to the trac site?
Jon
On Sat, Mar 31, 2007 at 7:10 PM, Stephen Pascoe <S.Pascoe at rl.ac.uk> wrote:
> I would also like to express some reservations about the usefulness of
> simple min/max attributes for the purpose John suggests (calculating
> appropriate colourbar ranges in visualisations). My experience is that a
> single pair of values is only relevant at a particular scale. Once you
> start subsetting a domain there's a good chance the actual min/max will be
> substantially different.
>
> For instance, taking an example from the IPCC data distribution centre, we
> have a diurnal temperature range field with a min--max of ca. 1--40 deg_c.
> However, half of this range is due to the variation over Greenland during
> the winter. Subset anywhere else and the max is more like 20 deg_c.
> Similarly, the maximum temperature field varies between ca. -50 and +45
> deg_c but most subselections in time or space only cover a fraction of this
> range.
>
> There is no harm in having optional CF attributes for min and max but I'm
> not convinced it will solve the problem. I like Steve's approach of
> providing the extrema in auxiliary variables. In CF min/max can be
> specified using the cell_methods attribute. What would be needed something
> like Steve's "parent" attribute to specify two variables represent the same
> field (with different cell_methods).
>
> Cheers,
> Stephen.
>
> ---
> Stephen Pascoe 01235 445980
> British Atmospheric Data Centre
> Rutherford Appleton Laboratory, CCLRC
>
>
> Steve Hankin wrote:
>>
>> Hi Jon,
>>
>> Can you really get away with simple attributes to contain the guidance on
>> extrema? For example, if this is 3D data (has a Z axis) and you are
>> interested in visualizations at different depths (heights), then the
>> "recommended" contour ranges might well need to be different for each depth
>> (illustrating why we have tended to back away from this problem for such a
>> long time).
>>
>> Might it make sense to think more in terms of min/max values stored in new
>> variables and identified by standard names. Here is a conceptual example
>> for discussion (not a formal proposal, so please cut me slack):
>>
>> variables:
>> float temperature(time,pres,lat,lon) ;
>> float temp_min(pres) ;
>> temp_min:parent = "temperature" ;
>> temp_min:standard_name = "minimum_over_domain" ;
>> float temp_max(pres) ;
>> temp_max:parent = "temperature" ;
>> temp_max:standard_name = "maximum_over_domain" ;
>>
>> This approach offers a lot more flexibility. Does the scope of the
>> problem that needs to be solved require this flexibility?
>>
>> - Steve
>>
>> ==================================================
>>
>> Jon Blower wrote:
>>>
>>> Dear Jonathan,
>>>
>>> OK, that sounds fine too. How do we move forward to incorporate this
>>> into the CF standard?
>>>
>>> Thanks, Jon
>>>
>>> On 3/28/07, Jonathan Gregory <j.m.gregory at reading.ac.uk> wrote:
>>>
>>>>
>>>> Dear Jon and Phil
>>>>
>>>> I'd suggest actual_min and actual_max, because they would complement the
>>>> already defined (Unidata standard) valid_min and valid_max.
>>>>
>>>> Cheers
>>>>
>>>> Jonathan
>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>> Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
>> 7600 Sand Point Way NE, Seattle, WA 98115-0070
>> ph. (206) 526-6080, FAX (206) 526-6744
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>
>
--
--------------------------------------------------------------
Dr Jon Blower Tel: +44 118 378 5213 (direct line)
Technical Director Tel: +44 118 378 8741 (ESSC)
Reading e-Science Centre Fax: +44 118 378 6413
ESSC Email: jdb at mail.nerc-essc.ac.uk
University of Reading
3 Earley Gate
Reading RG6 6AL, UK
--------------------------------------------------------------
Received on Mon Jun 09 2008 - 09:04:45 BST