[CF-metadata] Multiple file datasets (was: Swath observational data) from John Caron on 2009-11-20 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: John Caron <caron>
Date: Fri, 20 Nov 2009 05:29:37 -0700

This topic deserves its own heading, so here it is.

Perhaps we should gather current practices and ideas. I think Balaji's gridspec has a proposal about this. Can anyone summarize what SAFE does?

Im imagining how this is actually used, eg:

float data(y,x);
data:coordinates = "lat at file1 lon at file2";

????

John Graybeal wrote:
> I like Bryan's recommendation for a UUID or similar.
>
> Now I'm going to be annoying and suggest the UUID *could* be a URI, or
> these days, an IRI (International ..).
>
> And I think the way of 'locating' the file should be neither in
> packaging nor in local resolution; it should be in global namespace
> resolution. This is the way of the future, and is already more
> 'permanent' than either packaging or local resolution, IMHO.
>
> There is one form of URI in particular that is already resolvable: a
> URL. OK, that's an old song, but I'm gonna stick to it for a while
> longer. That form meets all the other requirements: it can be
> registered in a resolver, it can be guaranteed unique (to the same
> authority level as a UUID, anyway), and it is a unique string that can
> be used to validate the link). And it has the obvious benefit of being
> resolvable right now, for as long as the domain is held and properly
> maintained (Good URLs don't die).
>
> Since the last paragraph risks starting another unique identifier war, I
> promise not to re-engage unless someone asks me to. Meanwhile, I like
>
> John
>
>
> On Nov 19, 2009, at 22:23, Bryan Lawrence wrote:
>
>> On Thursday 19 November 2009 19:40:08 Jonathan Gregory wrote:
>>>> ... In some cases, referencing attributes such as
>>>> "coordinates" and "ancillary_variables" would, ideally, point to a
>>>> variable in a different dataset.
>>>
>>> This is a general problem to which CF doesn't have a solution because
>>> it was
>>> conceived as a convention for single netCDF files. However we need a
>>> solution
>>> as often several files should be treated as a single dataset.
>>>
>>> If the files don't overlap i.e. their contents are complementary, I
>>> think it
>>> should be satisfactory to allow variables in one file to be pointed
>>> to by name
>>> from another file, with no other mechanism being required within the
>>> file. I
>>> don't like the idea of naming one file within another file, as that
>>> would be
>>> very fragile. Instead, I think the file aggregation should be implied by
>>> simply defining the group of files which are to be treated as one
>>> file e.g.
>>> by putting them in one directory.
>>
>> It's the old ones that are the best ones :-) :-) this issue keeps on
>> coming back ... :-) :-) and we keep trying to ignore it ...
>>
>> I think we agree that an actual physical filename including path is
>> useless. We need both a relative link which relies on the
>> preservation of a group of files in a particular arrangement ... AND
>> an internal identifier so more robust linking mechanisms can be used
>> when (if) the data ends up in a managed environment.
>>
>> I think it's crucial in this situation to ensure that each file has a
>> unique identifier within it (created, for example, with uuid), because
>> all solutions which rely on packaging are fragile (SAFE is probably
>> better than most), but the bottom line is that users move files around
>> ... and we need some way of ensuring that we/they can validate the
>> links that are in place are the ones that were originally intended.
>>
>> So relative links would also include the identifier of the intended
>> target as well as the relative path in operating system agnostic terms.
>>
>> That identifier can be used in two ways: to validate the link (my
>> software can always check that the variable that I just opened
>> following a link from another one is the one that was expected by
>> checking the container identifier), and b) to produce an identifier
>> resolver service for the situation where the packaging has had to be
>> broken (which might occur for performance reasons or ...)
>>
>> CF could recommend something like this ...
>>
>> Bryan
>>
>> --
>> Bryan Lawrence
>> Director of Environmental Archival and Associated Research
>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
>> STFC, Rutherford Appleton Laboratory
>> Phone +44 1235 445012; Fax ... 5848;
>> Web: home.badc.rl.ac.uk/lawrence
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
> --------------
> I have my new work email address: jgraybeal at ucsd.edu
> --------------
>
> John Graybeal <mailto:jgraybeal at ucsd.edu>
> phone: 858-534-2162
> Development Manager
> Ocean Observatories Initiative Cyberinfrastructure Project:
> http://ci.oceanobservatories.org
> Marine Metadata Interoperability Project: http://marinemetadata.org
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Fri Nov 20 2009 - 05:29:37 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST