⇐ ⇒

[CF-metadata] Indicating data lineage or provenance

From: Bryan Lawrence <b.n.lawrence>
Date: Fri, 05 Jan 2007 08:20:15 +0000

I think the most important thing to get right is the content model (i.e.
what does one want to know/record), and then worry about whether it is
in the netcdf file or an accompanying file (serialised however).

For the record, our (BADC) attempts at dealing with this are at
http://proj.badc.rl.ac.uk/ndg/wiki/NumSim

(and despite what I said, the schema is an xml xsd, rather than
serialisation independent).

The NumSim project has been somewhat in abeyance due to other
priorities, but it's about to be revamped in the context of a new
funding line (to support both research use and public engagement with
climate simulations), and we expect to be documenting all our simulation
data in the next twelve months using it.

As part of that activity we'll be giving it a makeover in partnership
with the Met Office and looking at how it relates to the NMM work coming
out of Reading (which is being looked at in a number of contexts).
Ideally NumSim ought to be a human readable/generated subset/component
of NMM, and our vision is that it ought to meet the requirement you're
outlining here.

In any case you will see that NumSim allows the explicit linking to
datasets which are used as boundary conditions and initial conditions
for simulations.

It'd be great if you wanted to make some specific criticisms of what we
have now in terms of the content model. Then we should worry about how
we use it :-)

Cheers
Bryan


On Thu, 2007-01-04 at 16:31 -0800, Godin, Michael wrote:
> I am heartened by all the work this group has put into standardizing
> the metadata for representing multiple models as an ensemble.
> However, a particularly thorny issue has been for the most part
> ignored (I think it has been called a "nightmare"), so I'd like to see
> if some of the list participants would be willing to work together to
> form a proposal for indicating the provenance of derived data (for
> example, initial conditions, larger nested grids, and assimilated data
> that go into models).
>
> So here are the (draft) requirements that I believe need to be
> addressed:
> - derived data users need to be provided the information they need to
> understand the differences between data (covering the same
> temporal/spatial region) from different models and different
> realizations of the same model.
> - skeptics (public, governmental, other modelers,
> observationalists) should be able to request specific observational
> data that went into a model realization (granted, the request may be
> for data that would not otherwise be made publicly available).
> - the specification of source data should not only indicate the source
> data files (or URLs) and variables, but also the
> temporal/spatial/realization bounds on the supplied data.
>
> I don't know if such a set of requirements can be addressed in a
> netCDF file, or if it would require a link to an external XML (or
> other format) file. I am also unsure if any other community has
> solved the above set of requirements - both the OGC's Layer
> definition within their Web Map Context Document standard, and the
> FGDC's Lineage definition within their Content Standard for Digital
> Geospatial Metadata allow one to specify a lot of metadata about
> lineage and provenance, but neither really meets the requirements
> above.
>
> My initial thought for doing this within a netCDF file would be to
> specify a global multi-line string attribute called something like
> "lineage" or "provenance" and populate it with a series of DAP2.0-like
> URIs (of course, this would not be global in the case of ensembles --
> it would have to be a 3D set of strings!). The DAP2.0 URIs would not
> have to be publicly accessible, and the syntax would have to allow
> combinations of hyperslab operators and queries -- which I do not
> believe any DAP server actually allows -- but would allow one to
> specify precise data ranges.
>
> Thanks for your consideration,
> Mike
> _____________________________________________
>
> Michael A. Godin
>
> Software Engineer
>
> Monterey Bay Aquarium Research Institute
>
> Phone: 831-775-2063 http://www.mbari.org
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Fri Jan 05 2007 - 01:20:15 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒