⇐ ⇒

[CF-metadata] high sample rate (seismic) data conventions

From: Jim Biard <jbiard>
Date: Fri, 14 Apr 2017 11:10:51 -0400

Jonathan,

There is an associated convention, the Attribute Convention for Dataset
Discovery (ACDD)
<http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_1-3>
that defines file-level attributes detailing things like
time_coverage_resolution. The issue of high-rate time data arises with
satellite data as well. The files often end up with a time variable
containing lower-rate start times for groups of measurements. Some also
have a second coordinate variable containing fixed relative times that
are intended to be interpreted as offsets to the start times, but this
is not a standardized practice.

Grace and peace,

Jim


On 4/10/17 4:33 PM, Maccarthy, Jonathan K wrote:
> Seth & Roy,
>
> Technically CF-compliant but ?unconventional? is probably not the way
> to go, as I?d miss out on the tools that use the convention, which is
> the point of using the standard. I think I just needed someone to
> help me navigate the CF documents, as they?re rather dense and
> unfamiliar:-) Originally, I came across this EarthCube page
> (https://www.earthcube.org/group/advancing-netcdf-cf) about expanding
> CF conventions, and I thought I'd read that a seismologist was
> involved. Seismology formats are a zoo, so I?m always on the hunt for
> a well-documented standard, especially one with community already
> behind it:-)
>
> Thanks again, all!
>
> Best,
> Jon
>
>> On Apr 10, 2017, at 11:54 AM, Seth McGinnis <mcginnis at ucar.edu
>> <mailto:mcginnis at ucar.edu>> wrote:
>>
>> Hi Jonathan,
>>
>> Oh, climate model outputs are also supposed to have a uniform sample
>> rate for the whole time series -- emphasis on *SUPPOSED TO*. To my
>> dismay, I have encountered multiple cases where something went wrong
>> with the generation of the data files, resulting in missing or repeated
>> or weirdly-spaced timesteps, and sorting out the resulting problems is
>> how I came to appreciate the value of the explicit coordinate...
>>
>> As far as I know, you are correct that CF does not have a standardized
>> way to represent a coordinate solely in terms of a formula without
>> reference to a corresponding coordinate variable.
>>
>> However, that doesn't mean you couldn't do it and still have the file be
>> CF-compliant. As far as I am aware (and somebody correct me if I'm
>> wrong), coordinate variables are not actually mandatory.
>>
>> So if, for reasons of feasibility, you found it necessary to do
>> something like the following, I believe that strictly speaking it would
>> be not just allowed but fully CF-compliant:
>>
>> dimensions:
>> time = UNLIMITED; // (1892160000 currently)
>> variables:
>> double acceleration(time);
>> acceleration:long_name = "ground acceleration";
>> acceleration:units = "m s-2";
>> acceleration:start_time = "2017-01-01 00:00:00.01667"
>> acceleration:sampling_rate = "60 hz"
>> data:
>> acceleration = 1.324145e-6, ...
>>
>>
>> I actually have some files without any coordinate variables sitting
>> around from the intermediate stage of some processing I did; I checked
>> one with Rosalyn Hatcher's cf-checker, and it didn't complain, so I
>> think it is technically legal. It's kind of a letter-of-the-law rather
>> than spirit-of-the-law thing, but it's at least theoretically compliant.
>> Up to you whether that would count as sufficiently suitable for your
>> use case.
>>
>> Cheers,
>>
>> --Seth
>>
>>
>>
>> On 4/10/17 10:54 AM, Maccarthy, Jonathan K wrote:
>>> Hi Seth,
>>>
>>> Thanks for the very helpful response. I can understand the argument for
>>> explicit coordinates, as opposed to using formulae; I think it solves
>>> several problems. The assumption of a uniform sample rate for the
>>> length of a continuous time series is deeply engrained in most seismic
>>> software, however. Changing that assumption may lead to other problems
>>> (but maybe not!). Data volumes for a single channel can be 40-100
>>> 4-byte samples per second, which is something like 5-12 GB per channel
>>> per year uncompressed. Commonly, dozens of channels are used at once,
>>> though some of them may share time coordinates. It sounds like this
>>> use-case is similar in volume to what you've used, and may be worth
>>> trying out.
>>>
>>> Just to be clear, however, would I be correct in saying that CF has no
>>> accepted way of representing the data as I've described?
>>>
>>> Thanks again,
>>> Jonathan
>>>
>>>> On Apr 7, 2017, at 4:43 PM, Seth McGinnis <mcginnis at ucar.edu
>>>> <mailto:mcginnis at ucar.edu>
>>>> <mailto:mcginnis at ucar.edu>> wrote:
>>>>
>>>> Hi Jonathan,
>>>>
>>>> I would interpret the CF stance as being that the value in having
>>>> explicit coordinate variables and other ancillary data to accompany the
>>>> data outweighs the cost of increased storage.
>>>>
>>>> There are some cases where CF bends away from that for the sake of
>>>> practicality (see, e.g., the discussion about external file references
>>>> for cell_bounds in CMIP5), but overall, my sense is that the community
>>>> feels that it's better to have things explicitly written out in the
>>>> file
>>>> than it is to provide them implicitly via a formula to calculate them.
>>>>
>>>> Based on my personal experiences, I think this is the right approach.
>>>> (In fact, I take it even further: I prefer to avoid data compression
>>>> entirely and to keep like data with like as much as possible, rather
>>>> than splitting big files into smaller pieces.)
>>>>
>>>> I have endured far, far more suffering and toil from (a) trying to
>>>> figure out what's wrong with a file that violates some implicit
>>>> assumption (like "there are never gaps in the time coordinate") and (b)
>>>> dealing with the complications of various tactics for keeping file
>>>> sizes
>>>> small than I ever have from storing and working with very large files.
>>>>
>>>> YMMV, of course. What are your data volumes like? I'm working at the
>>>> terabyte scale, and as long as my file sizes stay under a few dozen GB,
>>>> I don't really even bother thinking about anything that affects the
>>>> file
>>>> size by less than an order of magnitude.
>>>>
>>>> Cheers,
>>>>
>>>> Seth McGinnis
>>>>
>>>> ----
>>>> NARCCAP / NA-CORDEX Data Manager
>>>> RISC - IMAGe - CISL - NCAR
>>>> ----
>>>>
>>>>
>>>> On 4/7/17 9:55 AM, Maccarthy, Jonathan K wrote:
>>>>> Hi all,
>>>>>
>>>>> I?m curious about the suitability of CF metadata conventions for
>>>>> seismic sensor data. I?ve done a bit of searching, but can?t find
>>>>> any mention of how CF conventions would store high sample-rate data
>>>>> sensor data. I do see descriptions of time series conventions, where
>>>>> hourly or daily sensor data samples are stored along with their
>>>>> timestamps, but storing individual timestamps for each sample of a
>>>>> high sample rate sensor would unnecessarily double the storage.
>>>>> Seismic formats typically don?t store time vectors, but instead just
>>>>> store vectors of samples with an associated start time and sampling
>>>>> rate.
>>>>>
>>>>> Could someone please point me towards a discussion or existing
>>>>> conventions on this topic? Any help or suggestion is appreciated.
>>>>>
>>>>> Best, Jon _______________________________________________ CF-metadata
>>>>> mailing listCF-metadata at cgd.ucar.edu
>>>>> <mailto:CF-metadata at cgd.ucar.edu><mailto:CF-metadata at cgd.ucar.edu>
>>>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>>>>
>>>> _______________________________________________
>>>> CF-metadata mailing list
>>>> CF-metadata at cgd.ucar.edu
>>>> <mailto:CF-metadata at cgd.ucar.edu><mailto:CF-metadata at cgd.ucar.edu>
>>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-- 
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> 	*Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA?s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: jbiard at cicsnc.org <mailto:jbiard at cicsnc.org>
o: +1 828 271 4900
/Connect with us on Facebook for climate 
<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics 
<https://www.facebook.com/NOAANCEIoceangeo> information, and follow us 
on Twitter at _at_NOAANCEIclimate <https://twitter.com/NOAANCEIclimate> and 
_at_NOAANCEIocngeo <https://twitter.com/NOAANCEIocngeo>. /
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170414/7a154f9e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CicsLogoTiny.png
Type: image/png
Size: 15784 bytes
Desc: not available
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170414/7a154f9e/attachment.png>
Received on Fri Apr 14 2017 - 09:10:51 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒