⇐ ⇒

[CF-metadata] Indicating data lineage or provenance

From: John Graybeal <graybeal>
Date: Thu, 4 Jan 2007 16:47:17 -0800

To provide some data in response to Mike's question, and then a question of my own:

I, along with Maureen Edwards of the UK, are tasked by OceanSITES with presenting a nominal solution to provenance in netCDF. How far we can get, and how quickly, is definitely TBD, but the notion I have devolves to separate files. (Yes I do hate that, but provenance on a whole mooring system is pretty complicated to put into a netCDF file). So I'd probably suggest a link (URL) from netCDF to a registered SensorML instance (registrations of which are being pursued on another project I'm involved with). Similar to Mike's solution but with important differences.

One point being, this is a more general problem than just model provenance. Observation and processing provenance is also desirable to represent in netCDF files.

So the question is, how much of this does the CF standard want to take on directly, and how much does it want to defer to other standards or efforts?

(No I really didn't put Mike up to this, and he really is only 8 doors from me. But neither of us knew...)

John

At 4:31 PM -0800 1/4/07, Godin, Michael wrote:
>Content-class: urn:content-classes:message
>Content-Type: multipart/alternative;
> boundary="=_reb-r50C4DCF4-t459D9D0C"
>
>I am heartened by all the work this group has put into standardizing the metadata for representing multiple models as an ensemble. However, a particularly thorny issue has been for the most part ignored (I think it has been called a "nightmare"), so I'd like to see if some of the list participants would be willing to work together to form a proposal for indicating the provenance of derived data (for example, initial conditions, larger nested grids, and assimilated data that go into models).
>
>So here are the (draft) requirements that I believe need to be addressed:
>- derived data users need to be provided the information they need to understand the differences between data (covering the same temporal/spatial region) from different models and different realizations of the same model.
>- skeptics (public, governmental, other modelers, observationalists) should be able to request specific observational data that went into a model realization (granted, the request may be for data that would not otherwise be made publicly available).
>- the specification of source data should not only indicate the source data files (or URLs) and variables, but also the temporal/spatial/realization bounds on the supplied data.
>
>I don't know if such a set of requirements can be addressed in a netCDF file, or if it would require a link to an external XML (or other format) file. I am also unsure if any other community has solved the above set of requirements - both the OGC's Layer definition within their Web Map Context Document standard, and the FGDC's Lineage definition within their Content Standard for Digital Geospatial Metadata allow one to specify a lot of metadata about lineage and provenance, but neither really meets the requirements above.
>
>My initial thought for doing this within a netCDF file would be to specify a global multi-line string attribute called something like "lineage" or "provenance" and populate it with a series of DAP2.0-like URIs (of course, this would not be global in the case of ensembles -- it would have to be a 3D set of strings!). The DAP2.0 URIs would not have to be publicly accessible, and the syntax would have to allow combinations of hyperslab operators and queries -- which I do not believe any DAP server actually allows -- but would allow one to specify precise data ranges.
>
>Thanks for your consideration,
>Mike
>
>_____________________________________________
>
>Michael A. Godin
>
>Software Engineer
>
>Monterey Bay Aquarium Research Institute
>
>Phone: 831-775-2063 <http://www.mbari.org/>http://www.mbari.org
>
>
>
>_______________________________________________
>CF-metadata mailing list
>CF-metadata at cgd.ucar.edu
>http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata


-- 
----------
John Graybeal   <mailto:graybeal at mbari.org>  -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Initiative: http://marinemetadata.org   ||  Shore Side Data System: http://www.mbari.org/ssds
Received on Thu Jan 04 2007 - 17:47:17 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒