⇐ ⇒

[CF-metadata] Indicating data lineage or provenance

From: John Graybeal <graybeal>
Date: Wed, 10 Jan 2007 16:47:31 -0800

Your suggestion is fine with me, but I have the following amendment I'd like you to consider.

MMI tries to reference collections of metadata standards relevant to the metadata community. You've started a great list with your email. Do you think we could combine forces to make MMI the repository for "provenance content standards" references, and then have the CF wiki hold the CF discussion of how to decide which one? I'd be happy for contributions from anyone on this list, becoming a contributing member of MMI is pretty easy.

Of course, I'd also plan for MMI to reference the CF discussions on the topic, in case others want to get involved.

john

At 3:24 PM -0800 1/10/07, Godin, Michael wrote:
>Roy, John, and Bryan,
>
>I have been talking with a couple of colleagues who are very excited that we may make an attempt at making some standard recommendations for indicating provenance, either by adopting an external standard (SensorML? NumSim? Pstruct? DublinCore? BPEL? Karma? KPI?) and pointing to it in a (new) CF-metadata standard way, or by creating a completely new CF-Metadata provenance standard. While my primary interest is in indicating provenance of derived data; I think that provenance of measurements is just as important, and should be recorded in a similar manner.
>
>In any case, I'd be willing to make the first cut at populating a wiki page (on the CF Trac Twiki). Does this sound reasonable?
>
>Kind regards,
>Mike
>
>-----Original Message-----
>From: cf-metadata-bounces at cgd.ucar.edu [mailto:cf-metadata-bounces at cgd.ucar.edu] On Behalf Of Roy Lowry
>Sent: Sunday, January 07, 2007 8:37 AM
>To: cf-metadata at cgd.ucar.edu; sdn-tech at seadatanet.org
>Subject: Re: [CF-metadata] Indicating data lineage or provenance
>
>Hi John,
>
>My primary concern is that there is communication so we get a single solution, not yet another set of 'near duplicates'. Documentation and evaluation of what exists in an open forum like the CF Trac Twiki or an area on the MMI site would seem an excellent way to achieve this.
>
>As far as SeaDataNet is concerned, a model that could either be implemented both within NetCDF or as an XML document would be required as the project uses multiple protocols.
>
>Cheers, Roy.
>
>>>> John Graybeal <graybeal at mbari.org> 01/07/07 2:04 AM >>>
>Roy,
>
>Based on our experience so far with provenance-aware data systems, I suspect it is a very good (read: powerful) solution to this problem.
>
>There are multiple standards that encode provenance information. Would the CF project support an evaluation of the application of those standards; or are you looking for an embedded (into netCDF) solution; or is that a question to be discussed on the TWiki?
>
>Note also that one aspect of Mike's requirement, namely referencing a subset of a data set, is not so fully addressed (that I know of); participants of a recent AGU session hopes to kick off a discussion on this topic. But we have imagined some reasonably effective approaches using existing encoding standards.
>
>John
>
>
>At 4:20 PM +0000 1/6/07, Roy Lowry wrote:
>>Dear All,
>>
>>This issue is also of great concern to the SeaDataNet project, particularly in the case where multiple operational centres have grabbed a common raw dataset off the GTS and processed it independently creating 'near duplicates', which are difficult to identify. Standardised encoded provenance metadata has occurred to me as a possible solution tothis problem.
>>
>>We all seem to need the same thing, so I think collaboration is the order of the day. Could this be a candidate for a CF Twiki project advertised to other interested communities?
>>
>>Cheers, Roy.
>>
>>>>> John Graybeal <graybeal at mbari.org> 01/05/07 12:47 AM >>>
>>To provide some data in response to Mike's question, and then a question of my own:
>>
>>I, along with Maureen Edwards of the UK, are tasked by OceanSITES with presenting a nominal solution to provenance in netCDF. How far we can get, and how quickly, is definitely TBD, but the notion I have devolves to separate files. (Yes I do hate that, but provenance on a whole mooring system is pretty complicated to put into a netCDF file). So I'd probably suggest a link (URL) from netCDF to a registered SensorML instance (registrations of which are being pursued on another project I'm involved with). Similar to Mike's solution but with important differences.
> >
>>One point being, this is a more general problem than just model provenance. Observation and processing provenance is also desirable to represent in netCDF files.
>>
>>So the question is, how much of this does the CF standard want to take on directly, and how much does it want to defer to other standards or efforts?
>>
>>(No I really didn't put Mike up to this, and he really is only 8 doors
>>from me. But neither of us knew...)
>>
>>John
>>
>>At 4:31 PM -0800 1/4/07, Godin, Michael wrote:
>>>Content-class: urn:content-classes:message
>>>Content-Type: multipart/alternative;
>>> boundary="=_reb-r50C4DCF4-t459D9D0C"
>>>
>>>I am heartened by all the work this group has put into standardizing the metadata for representing multiple models as an ensemble. However, a particularly thorny issue has been for the most part ignored (I think it has been called a "nightmare"), so I'd like to see if some of the list participants would be willing to work together to form a proposal for indicating the provenance of derived data (for example, initial conditions, larger nested grids, and assimilated data that go into models).
>>>
>>>So here are the (draft) requirements that I believe need to be addressed:
>>>- derived data users need to be provided the information they need to understand the differences between data (covering the same temporal/spatial region) from different models and different realizations of the same model.
>>>- skeptics (public, governmental, other modelers, observationalists) should be able to request specific observational data that went into a model realization (granted, the request may be for data that would not otherwise be made publicly available).
>>>- the specification of source data should not only indicate the source data files (or URLs) and variables, but also the temporal/spatial/realization bounds on the supplied data.
>>>
>>>I don't know if such a set of requirements can be addressed in a netCDF file, or if it would require a link to an external XML (or other format) file. I am also unsure if any other community has solved the above set of requirements - both the OGC's Layer definition within their Web Map Context Document standard, and the FGDC's Lineage definition within their Content Standard for Digital Geospatial Metadata allow one to specify a lot of metadata about lineage and provenance, but neither really meets he requirements above.
>> >
>>>My initial thought for doing this within a netCDF file would be to specify a global multi-line string attribute called something like "lineage" or "provenance" and populate it with a series of DAP2.0-like URIs (of course, this would not be global in the case of ensembles -- it would have to be a 3D set of strings!). The DAP2.0 URIs would not have to be publicly accessible, and the syntax would have to allow combinations of hyperslab operators and queries -- which I do not believe any DAP server actually allows -- but would allow one to specify precise data ranges.
>>>
>>>Thanks for your consideration,
>>>Mike
>>>
>>>_____________________________________________
>>>
>>>Michael A. Godin
>>>
>>>Software Engineer
>>>
>>>Monterey Bay Aquarium Research Institute
>>>
>>>Phone: 831-775-2063 <http://www.mbari.org/>http://www.mbari.org
>>>
>>>
>>>
>>>_______________________________________________
>>>CF-metadata mailing list
>>>CF-metadata at cgd.ucar.edu
>>>http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>>
>>--
>>----------
>>John Graybeal <mailto:graybeal at mbari.org> -- 831-775-1956
>>Monterey Bay Aquarium Research Institute
>>Marine Metadata Initiative: http://marinemetadata.org || Shore Side Data System: http://www.mbari.org/ssds
>>_______________________________________________
>>CF-metadata mailing list
>>CF-metadata at cgd.ucar.edu
>>http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>>
>>--
>>This message (and any attachments) is for the recipient only. NERC is
>>subject to the Freedom of Information Act 2000 and the contents of this
>>email and any reply you make may be disclosed by NERC unless it is
>>exempt from release under the Act. Any material supplied to NERC may be
>>stored in an electronic records management system.
> >
>>
>>_______________________________________________
>>CF-metadata mailing list
>>CF-metadata at cgd.ucar.edu
>>http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
>--
>----------
>John Graybeal <mailto:graybeal at mbari.org> -- 831-775-1956
>Monterey Bay Aquarium Research Institute
>Marine Metadata Initiative: http://marinemetadata.org || Shore Side Data System: http://www.mbari.org/ssds
>_______________________________________________
>CF-metadata mailing list
>CF-metadata at cgd.ucar.edu
>http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
>--
>This message (and any attachments) is for the recipient only. NERC
>is subject to the Freedom of Information Act 2000 and the contents
>of this email and any reply you make may be disclosed by NERC unless
>it is exempt from release under the Act. Any material supplied to
>NERC may be stored in an electronic records management system.
>
>
>_______________________________________________
>CF-metadata mailing list
>CF-metadata at cgd.ucar.edu
>http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>_______________________________________________
>CF-metadata mailing list
>CF-metadata at cgd.ucar.edu
>http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata


-- 
----------
John Graybeal   <mailto:graybeal at mbari.org>  -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Initiative: http://marinemetadata.org   ||  Shore Side Data System: http://www.mbari.org/ssds
Received on Wed Jan 10 2007 - 17:47:31 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒