Seth,
On Thu, May 23, 2013 at 9:51 AM, Seth McGinnis <mcginnis at ucar.edu> wrote:
>>> Computing the min & max on the fly is cheap, and approximating it is even
>>> cheaper, so why introduce the uncertainty?
>>
>>... but computing min & max on the fly can also be very expensive.
>>We have aggregated model output datasets where each variable is more
>>than 1TB!
>
> Sure, I can see that that's useful metadata about the dataset, and that
> there's value in caching it somewhere. I just don't think it belongs with
> the metadata inside the netcdf file. What's the use case for storing it
> there?
>
> Because the problem remains that, unless you're storing and serving
> that dataset as a single 1 TB file that never gets modified or subset,
> as soon as anything at all happens to the file, those min and max
> values become tainted and unreliable, and ought to be recomputed.
That's a great point. Funny, because I've been making the same
arguments against storing time and space extents in the netcdf file,
which was first suggested here:
http://www.unidata.ucar.edu/software/netcdf-java/formats/DataDiscoveryAttConvention.html
and now being revisited here:
http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_%28ACDD%29_Working
Thanks for snapping me back to reality!
-Rich
--
Dr. Richard P. Signell (508) 457-2229
USGS, 384 Woods Hole Rd.
Woods Hole, MA 02543-1598
Received on Thu May 23 2013 - 08:00:36 BST