⇐ ⇒

[CF-metadata] high sample rate (seismic) data conventions

From: Maccarthy, Jonathan K <jkmacc>
Date: Mon, 10 Apr 2017 20:33:38 +0000

Seth & Roy,

Technically CF-compliant but ?unconventional? is probably not the way to go, as I?d miss out on the tools that use the convention, which is the point of using the standard. I think I just needed someone to help me navigate the CF documents, as they?re rather dense and unfamiliar:-) Originally, I came across this EarthCube page (https://www.earthcube.org/group/advancing-netcdf-cf) about expanding CF conventions, and I thought I'd read that a seismologist was involved. Seismology formats are a zoo, so I?m always on the hunt for a well-documented standard, especially one with community already behind it:-)

Thanks again, all!

Best,
Jon

On Apr 10, 2017, at 11:54 AM, Seth McGinnis <mcginnis at ucar.edu<mailto:mcginnis at ucar.edu>> wrote:

Hi Jonathan,

Oh, climate model outputs are also supposed to have a uniform sample
rate for the whole time series -- emphasis on *SUPPOSED TO*. To my
dismay, I have encountered multiple cases where something went wrong
with the generation of the data files, resulting in missing or repeated
or weirdly-spaced timesteps, and sorting out the resulting problems is
how I came to appreciate the value of the explicit coordinate...

As far as I know, you are correct that CF does not have a standardized
way to represent a coordinate solely in terms of a formula without
reference to a corresponding coordinate variable.

However, that doesn't mean you couldn't do it and still have the file be
CF-compliant. As far as I am aware (and somebody correct me if I'm
wrong), coordinate variables are not actually mandatory.

So if, for reasons of feasibility, you found it necessary to do
something like the following, I believe that strictly speaking it would
be not just allowed but fully CF-compliant:

dimensions:
 time = UNLIMITED; // (1892160000 currently)
variables:
 double acceleration(time);
   acceleration:long_name = "ground acceleration";
   acceleration:units = "m s-2";
   acceleration:start_time = "2017-01-01 00:00:00.01667"
   acceleration:sampling_rate = "60 hz"
data:
   acceleration = 1.324145e-6, ...


I actually have some files without any coordinate variables sitting
around from the intermediate stage of some processing I did; I checked
one with Rosalyn Hatcher's cf-checker, and it didn't complain, so I
think it is technically legal. It's kind of a letter-of-the-law rather
than spirit-of-the-law thing, but it's at least theoretically compliant.
Up to you whether that would count as sufficiently suitable for your
use case.

Cheers,

--Seth



On 4/10/17 10:54 AM, Maccarthy, Jonathan K wrote:
Hi Seth,

Thanks for the very helpful response. I can understand the argument for
explicit coordinates, as opposed to using formulae; I think it solves
several problems. The assumption of a uniform sample rate for the
length of a continuous time series is deeply engrained in most seismic
software, however. Changing that assumption may lead to other problems
(but maybe not!). Data volumes for a single channel can be 40-100
4-byte samples per second, which is something like 5-12 GB per channel
per year uncompressed. Commonly, dozens of channels are used at once,
though some of them may share time coordinates. It sounds like this
use-case is similar in volume to what you've used, and may be worth
trying out.

Just to be clear, however, would I be correct in saying that CF has no
accepted way of representing the data as I've described?

Thanks again,
Jonathan

On Apr 7, 2017, at 4:43 PM, Seth McGinnis <mcginnis at ucar.edu<mailto:mcginnis at ucar.edu>
<mailto:mcginnis at ucar.edu>> wrote:

Hi Jonathan,

I would interpret the CF stance as being that the value in having
explicit coordinate variables and other ancillary data to accompany the
data outweighs the cost of increased storage.

There are some cases where CF bends away from that for the sake of
practicality (see, e.g., the discussion about external file references
for cell_bounds in CMIP5), but overall, my sense is that the community
feels that it's better to have things explicitly written out in the file
than it is to provide them implicitly via a formula to calculate them.

Based on my personal experiences, I think this is the right approach.
(In fact, I take it even further: I prefer to avoid data compression
entirely and to keep like data with like as much as possible, rather
than splitting big files into smaller pieces.)

I have endured far, far more suffering and toil from (a) trying to
figure out what's wrong with a file that violates some implicit
assumption (like "there are never gaps in the time coordinate") and (b)
dealing with the complications of various tactics for keeping file sizes
small than I ever have from storing and working with very large files.

YMMV, of course. What are your data volumes like? I'm working at the
terabyte scale, and as long as my file sizes stay under a few dozen GB,
I don't really even bother thinking about anything that affects the file
size by less than an order of magnitude.

Cheers,

Seth McGinnis

----
NARCCAP / NA-CORDEX Data Manager
RISC - IMAGe - CISL - NCAR
----
On 4/7/17 9:55 AM, Maccarthy, Jonathan K wrote:
Hi all,
I?m curious about the suitability of CF metadata conventions for
seismic sensor data.  I?ve done a bit of searching, but can?t find
any mention of how CF conventions would store high sample-rate data
sensor data.  I do see descriptions of time series conventions, where
hourly or daily sensor data samples are stored along with their
timestamps, but storing individual timestamps for each sample of a
high sample rate sensor would unnecessarily double the storage.
Seismic formats typically don?t store time vectors, but instead just
store vectors of samples with an associated start time and sampling
rate.
Could someone please point me towards a discussion or existing
conventions on this topic?  Any help or suggestion is appreciated.
Best, Jon _______________________________________________ CF-metadata
mailing list CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu> <mailto:CF-metadata at cgd.ucar.edu>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu<mailto:CF-metadata at cgd.ucar.edu> <mailto:CF-metadata at cgd.ucar.edu>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170410/084905f9/attachment.html>
Received on Mon Apr 10 2017 - 14:33:38 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒