⇐ ⇒

[CF-metadata] Indicating data lineage or provenance

From: Roy Lowry <rkl>
Date: Tue, 16 Jan 2007 16:34:23 +0000

Hi Nan,

Although the vocabulary issue is a thorny one, it is not intractable and has been tackled in other domains such as biology and chemistry where dataset generation is undertaken by a workflow system, which is driven by a control file that is a perfect description of the dataset provenance....

Cheers, Roy.

>>> Nan Galbraith <ngalbraith at whoi.edu> 1/16/2007 4:18 pm >>>

> One point being, this is a more general problem than just model
> provenance. Observation and processing provenance is also desirable
> to represent in netCDF files.

We've muddled along using the "history" attribute to record processing and
other sensor info for mooring data. It breaks down pretty severely, as John
said, with 2d mooring data. The problem for us is not so much with CF as
it is with NetCDF; if there were "bin level" attributes and not just
parameter
and global levels, we could use these to record the origin and
processing of
each instrument. This is being addressed in the thread "Getting back to
ensembles" - along with several unrelated ideas - so we should include some
of that discussion if we're moving this to a forum elsewhere.

> the notion I have devolves to separate files. (Yes I do hate that,
> but provenance on a whole mooring system is pretty complicated to put
> into a netCDF file).

The problem with separate files is that they can so easily get out of
sync. A
solution that allows all the info to be carried somewhere in the NetCDF
file,
as well as being broken out into an xml file, is really ideal, and I
think Jonathan
and others are addressing this on the other thread.

Developing any kind of useful (never mind standard) vocabulary for the
steps
we take in processing observational data will be a real bear, because
we create
new steps with just about every data set we work on - there's no real
standard
process, even within my small group. I guess this is why Michael calls
this a
nightmare in the original post on this thread.

I'll look for the discussion on the MMI and CF sites; this is an
interesting
problem that 's going to need to be solved fairly soon.

http://www.cgd.ucar.edu/pipermail/cf-metadata/2006/001397.html

> We propose to allow auxiliary coordinate variables with the ensemble
> index dimension to contain metadata identifying the institution,
> source, experiment_id and realization of the data.

Cheers - Nan

-- 
**************************************************************
* Nan Galbraith            Upper Ocean Processes Group       *
* Woods Hole Oceanographic Institution Woods Hole, MA 02540  *
* http://uop.whoi.edu      (508) 289-2444                    *
**************************************************************
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu 
http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.
Received on Tue Jan 16 2007 - 09:34:23 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒