⇐ ⇒

[CF-metadata] FW: netcdf for particle trajectories

From: Rodrigo Fernandes <rodrigo.maretec>
Date: Thu, 12 Jan 2012 12:53:44 -0000

Hi again,

The decision of making MOHID model to handle HDF files instead of NetCDF was
taken by MARETEC probably 12 years ago (before my entrance in MARETEC), and
I think it was taken due to NetCDF limitations in that time - HDF was
providing the opportunity to compress and to establish an hierarchical
structure.
Meanwhile, NetCDF became a complete standard among modellers, and instead of
changing MOHID inputs or outputs, we developed converting tools to convert
HDF to NetCDF files and vice-versa.
In this stage, MOHID is now becoming in a new stage, because MOHID is being
prepared to handle NetCDF or HDF files, it's an end-user option.
I just put an example of an old Lagrangian output in the following temporary
link:
http://ge.tt/8KzvhDC (6.51 MB)
Our lagrangian model is being reformulated, because some outputs (like
weathering processes) are not included in lagrangian files, only in ascii
outputs.
If you need further details, just ask.

Other issue, just let me correct some information that I sent in the last
email in the word document: I proposed a time coordinate based on a array
with 6 columns, but I know that's not a standard in CG Conventions - the
standard is time (seconds, per example) since an initial reference. I just
forgot to include this. I think that the two time formats could be included,
because the fact is that in terms of Graphic user interfaces to handle
NetCDF files, the "seconds since 1992-10-8 15:15:42.5 -6:00" specification
is annoying and complex to handle: the initial time reference is variable,
as well as the units (In fact, one of the problem in CF conventions is that
they are so comprehensive, and include so many options, that in fact we can
almost do everything. This is an obstacle to generate new software tools to
handle NetCDF files from THREDDS catalogues, per example, as we are doing in
ARCOPOL and EASYCO project. I suppose that for particle files, we should be
more strict...

Best Regards
Rodrigo

-----Original Message-----
From: Ute Br?nner [mailto:Ute.Broenner at sintef.no]
Sent: quinta-feira, 12 de Janeiro de 2012 10:35
To: Rodrigo Fernandes; 'Chris Barker'
Cc: CF-metadata at cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg;
'CJ Beegle-Krause'; 'Caitlin O'Connor'; 'Alex Hadjilambris'; 'Rob Hetland';
rsignell at gmail.com
Subject: RE: FW: netcdf for particle trajectories

Rodrigo,

Thank you for your mail and suggestions!
While I agree that it might be useful to have grid information as well as
particle trajectory information in the same file I would suggest to keep
them in separate files.
Especially as we are aiming to agree on a standard for particle trajectories
which might not always be related to oil. As well I am not sure if the
others want to provide grid information at all.

The information that you use HDF5 directly instead of NetCDF is interesting.
Is that due to lack of functionality in NetCDF libraries or what is the
reason for that? Are you willing to share an example?

Best,
Ute

-----Original Message-----
From: Rodrigo Fernandes [mailto:rodrigo.maretec at ist.utl.pt]
Sent: Dienstag, 10. Januar 2012 20:09
To: 'Chris Barker'; Ute Br?nner
Cc: CF-metadata at cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg;
'CJ Beegle-Krause'; 'Caitlin O'Connor'; 'Alex Hadjilambris'; 'Rob Hetland';
rsignell at gmail.com
Subject: RE: FW: netcdf for particle trajectories

Hi everyone,
I suppose most of you (except Mark) doesn't know me, I was introduced in
this discussion group through Mark, which I met in Kuwait oil spill
modelling working group, where I was presenting the work from our group in
terms of oil spill modelling with MOHID (www.mohid.com) and risk management
in the Atlantic Area, through some EU projects. I'm an oil spill modeller
from MARETEC (www.maretec.org) in IST university (Portugal), and it is a
pleasure to participate in your discussion. Sorry for the late feedback.



I hope not to increase the entropy in the discussion, but I feel that
probably you are more focussed in some technical details than with the
properties needed (it's like you are more focussed on "how" than on "what").
I have some trouble following some points of your discussion, because we
produce our outputs in HDF5, although we can easily convert our input and
output files from / to NetCDF.
I just hope you don't impose limits to the outputs and standards needed due
to some technical details or limitations (like hierarchical structures) on
the file formats. In order to avoid this, I think NetCDF4 should be adopted.
In fact I'm not a specialist in file formats and their specificities,
however, I think I have a clear idea of what I need as an output from a
particle tracking model in terms of oil spill.
I'm sending my idea in a general format, in a table from the Word document
attached. I propose an hierarchical structure, which I think is definitely
more convenient.
 
Additionally, I think that some other properties could also be considered to
be included in the particle tracking outputs: the wind velocity at surface,
water temperature and currents velocity used by each lagrangian particle
could also be interesting. And probably also the particle velocity. This was
discussed and adopted as a common standard output from an European project
(ECOOP), and I also think some oil spill models have this natively, like
MOTHY (from M?t?o-France).

Best regards
Rodrigo Fernandes

________________________________________________________________
Rodrigo Fernandes
MARETEC - Instituto Superior T?cnico
Sec??o de Ambiente e Energia - Departamento de Engenharia Mec?nica Avenida
Rovisco Pais
1049 - 001 Lisboa - Portugal
Tel. +351 218 419 434 - Fax: +351 218 419 423 www.mohid.com www.maretec.org

-----Original Message-----
From: Chris Barker [mailto:Chris.Barker at noaa.gov]
Sent: segunda-feira, 28 de Novembro de 2011 19:41
To: Ute Br?nner
Cc: CF-metadata at cgd.ucar.edu; Ben Hetland; Mark Reed; Nils Rune Bodsberg; CJ
Beegle-Krause (CJ.Beegle-Krause at noaa.gov); Caitlin O'Connor
(Caitlin.OConnor at noaa.gov); Alex Hadjilambris (Alex.Hadjilambris at noaa.gov);
Rob Hetland (hetland at tamu.edu); Rodrigo Fernandes; rsignell at gmail.com
Subject: Re: FW: netcdf for particle trajectories

On 11/25/2011 5:01 AM, Ute Br?nner wrote:
> Hi folks,
>
> I kind of lost track of our latest discussions and had the feeling
> that this was partly outside the mailing group;

yes, it was -- we had some discussion among a subset of teh CF list that was
interested in particle model output.

> so I will try to sum up what we were discussing.

IN our group, we've settled on format for the GNOME model (at least for now,
we needed to use something) based on the discussion -- I"ve been remiss at
posting about it to larger group -- I was waiting for the time to write it
up a bit more clearly. More on that soon...

> My latest try was to produce NetCDF for particle trajectory trying to
> write out the concentration grid which resulted in a 11GB netFCDF3
> file :-(

when you say "grid" I'm wondering what you mean -- particle tracks don't
produce a grid of data -- maybe we're mixing issue here?

> So we have different motivations for discussion particle trajectory
> and netcdf4.
>
> First question: Does anybody know if and if yes, when writing netCDF4
> will be incorporated into the NetCDF Java library? Or will we use
> Python with the help of Jython etc.
> (http://www.slideshare.net/onyame/mixing-python-and-java) to write
> netCDF4?

I'm not sure mixin python and Java is going to help here -- the Python libs
use the C libs -- so mixing C and Java would probably be a better bet, if
you need Java. Jython isn't going to get you C-based Oython packages. (JEPP
might, as mentioned in that talk -- though if the goal is functionality that
really comes from C, straight JNI might make more
sense)

> Second question: Is there a de facto standard / proposal for writing
> Particle Trajectory Data which could be CF:featureType:<whatever we
> agree on>? The suggestion below is not suitable because: 1) we don't
> track a particle the whole time, it may disappear and show up again
> later, but if I have 1000 particles in time step 1 and 1000 in time
> step 2 we cannot be sure these 1000 are the same as before.

This was the whole point of the "ragged array" approach -- so that's
covered.

2) I cannot know the number of time steps in advance.

OK -- that is a challenge -- if we know neither the number of time steps,
nor the number of particles in advance, then we, by definition, need two
unspecified dimensions. I understand netcdf4 allows this -- may be a good
reason to go that route.

One question, though -- with the proposed ragged_array-specified format, the
time dimension is only used in one place - for the: int
rowSize(time) (or "particleCount", or whatever we want to call it) variable.

Is it possible, in netcdf3, to write the big array, with the UNLIMITED
dimension, then specify the time dimension and associated variable at the
end? Or does it need to all vbe defined at the start?

> and I might have int number_particles_per_timestep(time); :units =
> "1"; :long_name = "number particles per current timestep";
> :CF:ragged_row_count = "particle";

> That some of you need to know which spill a particle came from, may be
> solved with a 3rd dimension spill dimensions: spill = 3;

unless the spills all have the same number of particles at any given time,
that's not going to work.

Our solution is to have an "ID" variable to each particle, so they can be
isolated -- this can be used to track a given particle over time, and also
mapped to other data, like which spill it came from, etc.


  // or how
> many one has particle = UNLIMITED; //because it may change each time
> step

actually ULIMITED does help if it's going to change each time step (hence
the ragged array solution) -- but it is required as we often don't know how
many particles are going to be used in the end.

> how would one write this? With coordinates or as hierarchical data
> structure? At least we need the ability to use several unlimited
> dimensions and the ragged-array feature.

apparently, yes.

> Third question: How can we compress big netCDF3 files? Or is it
> smarter to go for netCDF4 directly with hierarchical data.

I do think compression and hierarchical data structure are separate issues.
netcdf4 is certainly the easy way to get compression, IIUC, to compress
neetcdf3, you need to do it before/after file reading/writing
-- so helpful for storing and transmitting the data, but you still need to
deal with the big files at some stage.

(or has anyone adapted a netcdf lib to use on-the fly compression (like with
libz) -- that would be cool)

> Hoping to get up the discussion again and that we agree on a standard
> quite soon!

yes, thanks for reviving it!

-Chris



  Have a nice weekend!
>
> Best, Ute
>
> -------- Original Message -------- Subject: [CF-metadata] Particle
> Track Feature Type (was: Re: point observation data in CF 1.4) Date:
> Fri, 19 Nov 2010 04:15:35 +0100 From: John
> Caron<caron at unidata.ucar.edu> To:
> cf-metadata at cgd.ucar.edu<cf-metadata at cgd.ucar.edu>
>
> Im thinking that we need a new feature type for this. Im calling it
> "particleTrack" but theres probably a better name.
>
> My reasoning is that the nested table representation of trajectories
> is:
>
> Table { traj_id; Table { time; lat, lon, z; data; } }
>
> but this case has the inner and outer table inverted:
>
> Table { time; Table { particle_id; lat, lon, z; data; data2; } }
>
> So, following that line of thought, the possibilities in CDL are:
>
> 1) If avg number of particles ~ max number of particles at any time
> step, then one could use multdimensional arrays:
>
> dimensions: maxParticles = 1000 ; time = 7777 ; // may be UNLIMITED
>
> variables:
>
> double time(time) ;
>
> int particle_id(time, maxParticles) ; float lon(time, maxParticles) ;
> float lat(time, maxParticles) ; float z(time, maxParticles) ; float
> data(time, maxParticles) ;
>
> attributes: :featureType = "particleTrack";
>
> note maxParticles is the max number of particles at any one time step,
> not total particle tracks. The particle trajectories have to be found
> by examining the values of particle_id(time, maxParticles).
>
> 2) The CDL of the ragged case would look like:
>
> dimensions: obs = 500000; // UNLIMITED time = 7777 ;
>
> variables: int time(time) ; int rowSize(time) ;
>
> int particle_id(obs) ; float lon(obs) ; float lat(obs) ; float z(obs)
> ; float data(obs) ;
>
> attributes: :featureType = "particleTrack";
>
> in this case, you dont have to know the max number of particles at any
> one time step, but you do need to know the number of time steps
> beforehand. The particle trajectories have to be found by examining
> the values of particle_id(obs). The particles at time step i are
> contained in the obs variables between start(i) to start(i) +
> rowSize(i).
>
> these layouts are optimized for processing all particles at a given
> time, and for sequentially processing time steps. If one wanted to
> process particle trajectories, that will be much slower. If you needed
> to do it a lot, you might want to rewrite the file. a more
> sophisticated application, possibly a server, could write an index to
> speed it up.
>
>
> _______________________________________________ CF-metadata mailing
> list CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
> -----Original Message----- From: rsignell at gmail.com
> [mailto:rsignell at gmail.com] On Behalf Of Rich Signell Sent:
> Donnerstag, 18. August 2011 19:04 To: Christopher Barker Cc: Ute
> Br?nner; Ben Hetland; Mark Reed; Nils Rune Bodsberg; CJ Beegle-Krause;
> Caitlin O'Connor; Alex Hadjilambris; Rob Hetland
> Subject: Re: netcdf for particle trajectories
>
> Chris,
>
>
>>>> so I'll make part of my homework to deliver you a Python script
>>>> using Whitaker's NetCDF4 that writes a sample file.
>>
>> How did this go, Rich?
>
> Yes, I took Rob Hetland's Python short course, and yes, I wrote a
> small example showing how to take NetCDF3 particle tracking output and
> create a compressed NetCDF4 file with chunking. I just forgot to send
> it. ;-)
>
> Note: You can get a OpenDAP-enabled NetCDF4 Python module for both 32
> and 64 bit windows from: http://www.lfd.uci.edu/~gohlke/pythonlibs/
>
> -Rich
>>
>> We're getting closer to a prototype file (i.e. we've got GNOME
>> writing something, but it still needs some tweaking). I'll sent out
>> an example when I think we're close.
>>
>> One new issue:
>>
>> In GNOME, we have the concept of any number of "spills" -- each spill
>> is a set of particles that usually share some properties.
>>
>> So we're trying to figure out how to capture that. Two ideas:
>>
>> 1) each spill is a unique set of data -- but I think that it would
>> ony be possible to do this by using a convension on teh variable
>> names:
>>
>> data_1 particle_count_1 longitude_1 latitude_1 ...
>>
>> data_2 particle_count_2 longitude_2 latitude_2 ...
>>
>> That seems pretty ugly. Could netcdf4's "hierarchical data" help us
>> here? Maybe this provides the motivation to use it.
>>
>> Option two:
>>
>> put all the particles in one big array, but identify the different
>> "spills" by particle ID:
>>
>> ID_range_1 = 0-1000 ID_range_2 = 1000-2000 ...
>>
>> then they could get split up by the client software, if desired, or
>> the separate spills could be ignored, and it could all be treated as
>> one.
>>
>> -- thoughts?
>>
>>
>> -- Christopher Barker, Ph.D. Oceanographer
>>
>> Emergency Response Division NOAA/NOS/OR&R (206) 526-6959
>> voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA
>> 98115 (206) 526-6317 main reception
>>
>> Chris.Barker at noaa.gov
>>
>
>
>
> -- Dr. Richard P. Signell (508) 457-2229 USGS, 384 Woods Hole Rd.
> Woods Hole, MA 02543-1598



--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception
Chris.Barker at noaa.gov
Received on Thu Jan 12 2012 - 05:53:44 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒