On 11/29/11 4:15 AM, Jonathan Gregory wrote:
> That could be done if you can represent the data using a new kind of
> featureType to be added to the CF chapter on discrete sampling geometries,
> which will be included in CF 1.6 (coming soon). The text for the discrete
> sampling geometry chapter is at
> http://www.unidata.ucar.edu/staff/caron/public/CFch9-feb25_jg.pdf
Sorry that the discussion about this has been so disjointed, but I think
our needs can not be met with somethign as simple a a new feature type.
We've had a bit of discussion about this, both on and off this list, but
I don't think anyone has kept good notes of the main points raised. I'll
try to write up a proposal soon, but briefly:
The goal is to store the output of "partical tracking"models. These are
used to mode the advection and dispersion of various substances in a
flow field: oil spills, larval transport, pollutants in the atmosphere, etc.
Some key features:
* In general, what is of interest is a collection (100s to 10,000s, or
more... ) of particles, rather than an one individual particle. Thus, it
is more likely that the user might ask:
"where are all the particles at time T?"
than:
"How did particle X travel over time?"
This has consequences on how one stores the data, so that either
question can be asked but the first is the more efficient one.
* particles can have many associated attributes (properties, etc) that
change over time.
* Some models create a set of particles at one time, the track them
for the duration of the run -- that is the easy case. But many models
create and destroy particles as the model runs -- adding particles when
increased resolution is desired, removing them as they move out of the
domain, or are destroyed by physical processes.
This is a key issue -- it is not so straightforward how to store them
when they numbers change, and when you don't knoe at the start of teh
model run how many particles there will be at any given time, or even
the maximum number of particles.
With discussion, we had come to something of a consensus that in order
to accommodate these needs, a "ragged array" approach would work. i.e. a
2-d table of sorts, with one row for each time step, and where each row
might be any length. There appears to be something of a standard for
this in CF already, and we have attempted to use that (more later).
We've got a version of this working now in out software, but...
The trick that Ute has brought up is that you may now neither how many
particles there will be, nor how many time steps. Thus you would like to
have two "unlimited" dimensions, which netcdf3 does not support. We've
accomplished it because we know how many time steps will be run before
we start.
My first thought is that we could use exactly the same format as hs been
discusses already, but make it optional to use netcdf4, an an unlimited
time dimension. Presumably these files could be easily converted, after
the fact, to a netcdf3 format, as the number of time steps would then be
known.
About netcdf 3 vs. 4 -- it seems netcdf4 has some nice features, after
all, it was developed for a reason. However it doesn't not appear to
have been widely adopted yet. However, maybe we really shouldn't bend
over backwards to fit a data model to netcdf3 anymore -- it's a chick
and egg problem, maybe time to make some eggs.
For our part, we use the netcdf4 lib with Python anyway, though our
C/C++ code is all using netcdf3 -- the burden of compiling the hdf libs
is something we choose to avoid, though it's not that big a deal.
Anyway -- more soon, I hope.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
Received on Tue Nov 29 2011 - 14:02:52 GMT