Hello List,
Since I'm also dealing a bit with data sets holding "particle
collections" through time, I'd like to contribute some thoughts
regarding this. Our primary use here at SINTEF is for inputs to and
results from oil drift simulations, so we're dealing mostly at the sea
surface and below, although it appears to me that similar data sets are
just as applicable up in the atmosphere as well. (Particle collections
and bounding polygons for the ash cloud from the Eyjafjallaj?kull
eruption springs to mind as a fairly recent example.)
On 04.11.2010 03:19, John Caron wrote:
> 1) It seems clear that at each time step, you need to write out the
> data for whatever particles currently exist.
This is a very fair assessment in our case. One could generalize a bit
more: We have data organized as a series of time steps (as the primary
dimension). At each time step we have a number of "data" to store, and
they are of various sizes and types. Particles are but one of these
kinds of objects. Most of them may probably be treated similar to
particles, though, where a fixed set of properties describing each
object can simply be represented by a separate netCDF variable per property.
A more nasty example could be to represent an oil slick's shape and
position with a polygon. The number of vertices of that polygon would be
highly variable through time. (This is a typical GIS-like representation.)
> I assume that if you wanted to break up the data for a very
> long run, you would partition by time, ie time steps 1-1000,
> 1001-2000, etc. would be in seperate files.
How one decides to partition I think can depend a lot on the
application. Sometimes splitting them on data type can be more
appropriate. In a recent case I had, the data were to be transferred to
a client computer over the Internet for viewing locally. In that case
reducing the content of the file to the absolute minimum set of
properties (that the client needed in order to visualize) became
paramount. Even a fast Internet connection does have bandwidth
limitations... :-)
> 2) Apparently the common read pattern is to retrieve the set of
> particles at a given time step. If so, that makes things easier.
Yes, often sequentially by time as well.
> 3) I assume that you want to be able to figure out an individual
> particle's trajectory, even if that doesnt need to be optimized
> for speed.
Not my primary need, but if an object is "tracked" like that it would
not be unlikely that the trajectory might need to be accessed
"interactively", eg. while a user is viewing a visualization of the data
directly on screen. Does that count as "optimized for speed"?
> 1) is the avg number (Navg) of particles that exist much smaller,
> or approx the same as the max number (Nmax) that exist at one time
> step?
This varies a lot. Sometimes it is like you suggest, but sometimes maybe
only a few. Sometimes there isn't any defined Nmax either (dynamic
implementations), or such a limit can be difficult to know beforehand.
Even where an Nmax is set, would it be unreasonable to require the
_same_ value to be used every time if the netCDF dataset was accumulated
through _multiple_ simulation runs?
> 2) How much data is associated with each particle at a given
> time step (just an estimate needed here - 10 bytes? 1000 bytes?)
In our case this varies a lot with type of particle, and how the
simulation was set up. A quick assessment indicates that some are only
16 bytes per particle, while others may currently require up to 824
bytes. (This does not account for shared info like the time itself,
which we don't store per particle.) It also wouldn't be very atypical if
this amount is then to be multiplied by say 20000 particles per time step.
Hope that provides some useful ideas of the real-life needs!
:-)
--
Regards, -+- Ben Hetland <ben.a.hetland at sintef.no> -+-
Opinions expressed are my own, not necessarily those of my employer.
Received on Thu Nov 04 2010 - 05:50:55 GMT