[CF-metadata] point observation data in CF 1.4 from CJ Beegle-Krause on 2010-11-04 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: CJ Beegle-Krause <CJ.Beegle-Krause>
Date: Thu, 04 Nov 2010 12:57:44 -0700

Christopher Barker wrote:
> On 11/3/10 7:19 PM, John Caron wrote:
>> 1) It seems clear that at each time step, you need to write out the data
>> for whatever particles currently exist. I assume that if you wanted to
>> break up the data for a very long run, you would partition by time, ie
>> time steps 1-1000, 1001-2000, etc. would be in seperate files.
>
> probably, yes.
>
>> 2) Apparently the common read pattern is to retrieve the set of
>> particles at a given time step. If so, that makes things easier.
>
> yes -- or a sequence of time steps, but also easy.
>
>> 3) I assume that you want to be able to figure out an individual
>> particle's trajectory, even if that doesnt need to be optimized for
>> speed.
>
> It should be possible, yes, but correct, that is the less common use
> case, and thus less important for good performance. And with particles
> coming and going, a mess anyway.
There are a number of cases other than time steps, none of which is more
common than getting a time slice, but should be considered
-Particles that have passed through a particle space (area or volume)
-Particles that have changed status at some time during their pathway
-Particles that have a particular age
>
>> 1) is the avg number (Navg) of particles that exist much smaller, or
>> approx the same as the max number (Nmax) that exist at one time step?
>
> I think this is very use-case dependent -- how would that change what
> we might want to do?
Examples
1 Single release trajectory - so all particles start at same time
2.Continuous release trajectory - particles start at same or traveling
location, and then the particles are followed
3 Variation in release points - so looking at grouping by end points or
pathway (e.g. evaluation of particle receptors, and where particles come
from that contact those receptors)
>
>> 2) How much data is associated with each particle at a given time step
>> (just an estimate needed here - 10 bytes? 1000 bytes?)
>
> For us, it's currently about 40 bytes, but we'll be adding an unknown
> amount in the future -- maybe up to a couple hundred.
>
>> A more nasty example could be to represent an oil slick's shape and
>> position with a polygon. The number of vertices of that polygon would be
>> highly variable through time. (This is a typical GIS-like
>> representation.)
>
> I think that is an entirely different use-case -- and one probably bet
> handles with GIS format -- although GIS does time (and 3d) really badly.
Agreed.
>
>> How one decides to partition I think can depend a lot on the
>> application. Sometimes splitting them on data type can be more
>> appropriate. In a recent case I had, the data were to be transferred to
>> a client computer over the Internet for viewing locally. In that case
>> reducing the content of the file to the absolute minimum set of
>> properties (that the client needed in order to visualize) became
>> paramount. Even a fast Internet connection does have bandwidth
>> limitations... :-)
>
> I think that's a little different than partitioning -- it's more
> subsetting the data, and yes, I think we would often want to only put
> the relevant data for a given application in a given file.
>
> My thought is that there would be very few required variables (but
> hopefully some standard names for many more!)
It seems like we're missing something in generalizing the coordinate
from thinking so much about (x.y) or (x.y.z) as now subordinate to
time. With trajectories, time is the prime coordinate, and in
geophysical applications, depth/height z seems like the next most
important - data is more likely to be on same or similar z levels than
change z levels in many applications. Then come the x,y coordiate at
tertiary.
>
>>> 3) I assume that you want to be able to figure out an individual
>>> particle's trajectory, even if that doesnt need to be optimized
>>> for speed.
>>
>> Not my primary need, but if an object is "tracked" like that it would
>> not be unlikely that the trajectory might need to be accessed
>> "interactively", eg. while a user is viewing a visualization of the data
>> directly on screen. Does that count as "optimized for speed"?
>
> well, yes. IIUC, then if we group by time step, then you'd essentially
> have to read all the data to follow a single particle through time. If
> that was a common need, then the data should be arranged differently.
> But then, the already-proposed standard for "trajectories" would work
> well for that already.
>
>> It also wouldn't be very atypical if
>> this amount is then to be multiplied by say 20000 particles per time
>> step.
>
> or more -- we use 1000 typically for surface trajectories, and 10,000
> for 3-d at a minimum -- if it's a long term spill, it could be a lot
> more (i.e. the event in the Gulf this summer?)
>
> I think we're converging on something here.
>
> I just got some sample Sintef files sent to me, and I've been working
> on my own sample -- I'll see if I can reconcile those, and post
> something here.
>
> By the way -- it would be nice to have something that could
> accommodate ASA's models, too -- is there anyone from ASA on this list
> and paying attention?
>
> -Chris
>
>
>
>
Received on Thu Nov 04 2010 - 13:57:44 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST