⇐ ⇒

[CF-metadata] Indicating data lineage or provenance

From: Godin, Michael <Godin>
Date: Thu, 4 Jan 2007 16:31:58 -0800

I am heartened by all the work this group has put into standardizing the metadata for representing multiple models as an ensemble. However, a particularly thorny issue has been for the most part ignored (I think it has been called a "nightmare"), so I'd like to see if some of the list participants would be willing to work together to form a proposal for indicating the provenance of derived data (for example, initial conditions, larger nested grids, and assimilated data that go into models).
 
So here are the (draft) requirements that I believe need to be addressed:
- derived data users need to be provided the information they need to understand the differences between data (covering the same temporal/spatial region) from different models and different realizations of the same model.
- skeptics (public, governmental, other modelers, observationalists) should be able to request specific observational data that went into a model realization (granted, the request may be for data that would not otherwise be made publicly available).
- the specification of source data should not only indicate the source data files (or URLs) and variables, but also the temporal/spatial/realization bounds on the supplied data.
 
I don't know if such a set of requirements can be addressed in a netCDF file, or if it would require a link to an external XML (or other format) file. I am also unsure if any other community has solved the above set of requirements - both the OGC's Layer definition within their Web Map Context Document standard, and the FGDC's Lineage definition within their Content Standard for Digital Geospatial Metadata allow one to specify a lot of metadata about lineage and provenance, but neither really meets the requirements above.
 
My initial thought for doing this within a netCDF file would be to specify a global multi-line string attribute called something like "lineage" or "provenance" and populate it with a series of DAP2.0-like URIs (of course, this would not be global in the case of ensembles -- it would have to be a 3D set of strings!). The DAP2.0 URIs would not have to be publicly accessible, and the syntax would have to allow combinations of hyperslab operators and queries -- which I do not believe any DAP server actually allows -- but would allow one to specify precise data ranges.
 
Thanks for your consideration,
Mike

_____________________________________________

Michael A. Godin

Software Engineer

Monterey Bay Aquarium Research Institute

Phone: 831-775-2063 http://www.mbari.org <http://www.mbari.org/>

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20070104/f671ef20/attachment-0002.html>
Received on Thu Jan 04 2007 - 17:31:58 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒