⇐ ⇒

[CF-metadata] FW: netcdf for particle trajectories

From: John Caron <caron>
Date: Sat, 26 Nov 2011 10:14:57 -0700

Hi Ute:

On 11/25/2011 6:01 AM, Ute Br?nner wrote:
> Hi folks,
>
> I kind of lost track of our latest discussions and had the feeling that this was partly outside the mailing group; so I will try to sum up what we were discussing.
> My latest try was to produce NetCDF for particle trajectory trying to write out the concentration grid which resulted in a 11GB netFCDF3 file :-(
>
> So we have different motivations for discussion particle trajectory and netcdf4.
>
> First question:
> Does anybody know if and if yes, when writing netCDF4 will be incorporated into the NetCDF Java library? Or will we use Python with the help of Jython etc. (http://www.slideshare.net/onyame/mixing-python-and-java) to write netCDF4?

Im intending to incorporate the netcdf-4 C library into the netcdf-java
library using JNI. Im hoping to have something working in the next few
months, but we'll see. This will be an optional component, and will
obviously make portability an issue. If you want to use Python, probably
the one to use is Jeff Whittaker's at
http://code.google.com/p/netcdf4-python/, which is also an interface to
the netcdf-4 C library.

> Second question:
> Is there a de facto standard / proposal for writing Particle Trajectory Data which could be CF:featureType:<whatever we agree on>? The suggestion below is not suitable because:
> 1) we don't track a particle the whole time, it may disappear and show up again later, but if I have 1000 particles in time step 1 and 1000 in time step 2 we cannot be sure these 1000 are the same as before.
> 2) I cannot know the number of time steps in advance.


I think its time to start using netcdf-4 for large collections of point
data which need to be compressed. Instead of first making a standard, we
need to try out the possibilities and see how it performs. I think you
want to use Structures, as well as multiple unlimited dimensions. With
netcdf, we dont need the ragged array mecahnism - thats only needed to
overcome the limitations of the classic model.

Has anyone started down this path? If so, can you post example netcdf-4
files?

> I would like sth. like
> dimensions:
> particle = UNLIMITED; //because it may change each time step
> time = UNLIMITED; // because I don't know
>
> then every variable is like
> latitude (particle, time)
> longitude (particle, time)
>
> and I might have
> int number_particles_per_timestep(time);
> :units = "1";
> :long_name = "number particles per current timestep";
> :CF:ragged_row_count = "particle";
>
> That some of you need to know which spill a particle came from, may be solved with a 3rd dimension spill
> dimensions:
> spill = 3; // or how many one has
> particle = UNLIMITED; //because it may change each time step
> time = UNLIMITED; // because I don't know
>
> particle (spill, time)
>
> then every variable is like
> latitude (particle)
> longitude (particle)
>
> how would one write this? With coordinates or as hierarchical data structure?
> At least we need the ability to use several unlimited dimensions and the ragged-array feature.
>
> Third question:
> How can we compress big netCDF3 files? Or is it smarter to go for netCDF4 directly with hierarchical data. As in my example above I would need to write out a 11 GB file and then deflate it like described here http://www.unidata.ucar.edu/mailing_lists/archives/netcdf-java/2010/msg00095.html or with Rich's script; but is that really necessary?
>
>
> Hoping to get up the discussion again and that we agree on a standard quite soon!
> Have a nice weekend!
>
> Best,
> Ute
>
> -------- Original Message --------
> Subject: [CF-metadata] Particle Track Feature Type (was: Re: point observation data in CF 1.4)
> Date: Fri, 19 Nov 2010 04:15:35 +0100
> From: John Caron<caron at unidata.ucar.edu>
> To: cf-metadata at cgd.ucar.edu<cf-metadata at cgd.ucar.edu>
>
> Im thinking that we need a new feature type for this. Im calling it "particleTrack" but theres probably a better name.
>
> My reasoning is that the nested table representation of trajectories is:
>
> Table {
> traj_id;
> Table {
> time;
> lat, lon, z;
> data;
> }
> }
>
> but this case has the inner and outer table inverted:
>
> Table {
> time;
> Table {
> particle_id;
> lat, lon, z;
> data;
> data2;
> }
> }
>
> So, following that line of thought, the possibilities in CDL are:
>
> 1) If avg number of particles ~ max number of particles at any time step, then one could use multdimensional arrays:
>
> dimensions:
> maxParticles = 1000 ;
> time = 7777 ; // may be UNLIMITED
>
> variables:
>
> double time(time) ;
>
> int particle_id(time, maxParticles) ;
> float lon(time, maxParticles) ;
> float lat(time, maxParticles) ;
> float z(time, maxParticles) ;
> float data(time, maxParticles) ;
>
> attributes:
> :featureType = "particleTrack";
>
> note maxParticles is the max number of particles at any one time step, not total particle tracks. The particle trajectories have to be found by examining the values of particle_id(time, maxParticles).
>
> 2) The CDL of the ragged case would look like:
>
> dimensions:
> obs = 500000; // UNLIMITED
> time = 7777 ;
>
> variables:
> int time(time) ;
> int rowSize(time) ;
>
> int particle_id(obs) ;
> float lon(obs) ;
> float lat(obs) ;
> float z(obs) ;
> float data(obs) ;
>
> attributes:
> :featureType = "particleTrack";
>
> in this case, you dont have to know the max number of particles at any one time step, but you do need to know the number of time steps beforehand. The particle trajectories have to be found by examining the values of particle_id(obs). The particles at time step i are contained in the obs variables between start(i) to start(i) + rowSize(i).
>
> these layouts are optimized for processing all particles at a given time, and for sequentially processing time steps. If one wanted to process particle trajectories, that will be much slower. If you needed to do it a lot, you might want to rewrite the file. a more sophisticated application, possibly a server, could write an index to speed it up.
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
> -----Original Message-----
> From: rsignell at gmail.com [mailto:rsignell at gmail.com] On Behalf Of Rich Signell
> Sent: Donnerstag, 18. August 2011 19:04
> To: Christopher Barker
> Cc: Ute Br?nner; Ben Hetland; Mark Reed; Nils Rune Bodsberg; CJ Beegle-Krause; Caitlin O'Connor; Alex Hadjilambris; Rob Hetland
> Subject: Re: netcdf for particle trajectories
>
> Chris,
>
>
>>>> so I'll make part of my homework to deliver you a Python script
>>>> using Whitaker's NetCDF4 that writes a sample file.
>> How did this go, Rich?
> Yes, I took Rob Hetland's Python short course, and yes, I wrote a small example showing how to take NetCDF3 particle tracking output and create a compressed NetCDF4 file with chunking. I just forgot to send it. ;-)
>
> Note: You can get a OpenDAP-enabled NetCDF4 Python module for both 32 and 64 bit windows from:
> http://www.lfd.uci.edu/~gohlke/pythonlibs/
>
> -Rich
>> We're getting closer to a prototype file (i.e. we've got GNOME writing
>> something, but it still needs some tweaking). I'll sent out an example
>> when I think we're close.
>>
>> One new issue:
>>
>> In GNOME, we have the concept of any number of "spills" -- each spill
>> is a set of particles that usually share some properties.
>>
>> So we're trying to figure out how to capture that. Two ideas:
>>
>> 1) each spill is a unique set of data -- but I think that it would ony
>> be possible to do this by using a convension on teh variable names:
>>
>> data_1
>> particle_count_1
>> longitude_1
>> latitude_1
>> ...
>>
>> data_2
>> particle_count_2
>> longitude_2
>> latitude_2
>> ...
>>
>> That seems pretty ugly. Could netcdf4's "hierarchical data" help us here?
>> Maybe this provides the motivation to use it.
>>
>> Option two:
>>
>> put all the particles in one big array, but identify the different "spills"
>> by particle ID:
>>
>> ID_range_1 = 0-1000
>> ID_range_2 = 1000-2000
>> ...
>>
>> then they could get split up by the client software, if desired, or
>> the separate spills could be ignored, and it could all be treated as one.
>>
>> -- thoughts?
>>
>>
>> --
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R (206) 526-6959 voice
>> 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115
>> (206) 526-6317 main reception
>>
>> Chris.Barker at noaa.gov
>>
>
>
> --
> Dr. Richard P. Signell (508) 457-2229
> USGS, 384 Woods Hole Rd.
> Woods Hole, MA 02543-1598
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Sat Nov 26 2011 - 10:14:57 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒