⇐ ⇒

[CF-metadata] Multiple file datasets (was: Swath observational data)

From: V. Balaji <V.Balaji>
Date: Fri, 20 Nov 2009 08:51:15 -0500 (EST)

The gridspec indeed had a proposal about this. Clearly it was a bit
off-topic, but some mechanism of referring to other files was needed. It
consists of an attribute called a link_spec, which has attributes of a
baseURL, a relative pathname, and a checksum for verifying whether the
external file being referenced is indeed the one you're looking for.
There wasn't a special var at link syntax, but I don't see why it couldn't
have had one.

CMIP5 is proposing a simplified variant on the link_spec. A file
can have a global attribute "associated_files" which are also
formed out of a baseURL and relative pathnames. The only permitted
associated_files are gridspec, and cell areas and volumes that may
be used in cell_methods.

Other approaches have been proposed in this forum, most notably on Trac
#24 and #27, the common_concept thread and Benno's namespace thread.

SAFE has been explained already in this thread.

I agree with John, it would be good to consider this problem in
isolation, without the baggage of gridspecs or common concepts or
namespaces.

John Caron writes:

> This topic deserves its own heading, so here it is.
>
> Perhaps we should gather current practices and ideas. I think Balaji's
> gridspec has a proposal about this. Can anyone summarize what SAFE does?
>
> Im imagining how this is actually used, eg:
>
> float data(y,x);
> data:coordinates = "lat at file1 lon at file2";
>
> ????
>
>
>
> John Graybeal wrote:
>> I like Bryan's recommendation for a UUID or similar.
>>
>> Now I'm going to be annoying and suggest the UUID *could* be a URI, or
>> these days, an IRI (International ..).
>>
>> And I think the way of 'locating' the file should be neither in packaging
>> nor in local resolution; it should be in global namespace resolution. This
>> is the way of the future, and is already more 'permanent' than either
>> packaging or local resolution, IMHO.
>>
>> There is one form of URI in particular that is already resolvable: a URL.
>> OK, that's an old song, but I'm gonna stick to it for a while longer. That
>> form meets all the other requirements: it can be registered in a resolver,
>> it can be guaranteed unique (to the same authority level as a UUID,
>> anyway), and it is a unique string that can be used to validate the link).
>> And it has the obvious benefit of being resolvable right now, for as long
>> as the domain is held and properly maintained (Good URLs don't die).
>>
>> Since the last paragraph risks starting another unique identifier war, I
>> promise not to re-engage unless someone asks me to. Meanwhile, I like
>>
>> John
>>
>>
>> On Nov 19, 2009, at 22:23, Bryan Lawrence wrote:
>>
>>> On Thursday 19 November 2009 19:40:08 Jonathan Gregory wrote:
>>>>> ... In some cases, referencing attributes such as
>>>>> "coordinates" and "ancillary_variables" would, ideally, point to a
>>>>> variable in a different dataset.
>>>>
>>>> This is a general problem to which CF doesn't have a solution because it
>>>> was
>>>> conceived as a convention for single netCDF files. However we need a
>>>> solution
>>>> as often several files should be treated as a single dataset.
>>>>
>>>> If the files don't overlap i.e. their contents are complementary, I think
>>>> it
>>>> should be satisfactory to allow variables in one file to be pointed to by
>>>> name
>>>> from another file, with no other mechanism being required within the
>>>> file. I
>>>> don't like the idea of naming one file within another file, as that would
>>>> be
>>>> very fragile. Instead, I think the file aggregation should be implied by
>>>> simply defining the group of files which are to be treated as one file
>>>> e.g.
>>>> by putting them in one directory.
>>>
>>> It's the old ones that are the best ones :-) :-) this issue keeps on
>>> coming back ... :-) :-) and we keep trying to ignore it ...
>>>
>>> I think we agree that an actual physical filename including path is
>>> useless. We need both a relative link which relies on the preservation of
>>> a group of files in a particular arrangement ... AND an internal
>>> identifier so more robust linking mechanisms can be used when (if) the
>>> data ends up in a managed environment.
>>>
>>> I think it's crucial in this situation to ensure that each file has a
>>> unique identifier within it (created, for example, with uuid), because all
>>> solutions which rely on packaging are fragile (SAFE is probably better
>>> than most), but the bottom line is that users move files around ... and we
>>> need some way of ensuring that we/they can validate the links that are in
>>> place are the ones that were originally intended.
>>>
>>> So relative links would also include the identifier of the intended target
>>> as well as the relative path in operating system agnostic terms.
>>>
>>> That identifier can be used in two ways: to validate the link (my software
>>> can always check that the variable that I just opened following a link
>>> from another one is the one that was expected by checking the container
>>> identifier), and b) to produce an identifier resolver service for the
>>> situation where the packaging has had to be broken (which might occur for
>>> performance reasons or ...)
>>>
>>> CF could recommend something like this ...
>>>
>>> Bryan
>>>
>>> --
>>> Bryan Lawrence
>>> Director of Environmental Archival and Associated Research
>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
>>> STFC, Rutherford Appleton Laboratory
>>> Phone +44 1235 445012; Fax ... 5848;
>>> Web: home.badc.rl.ac.uk/lawrence
>>> _______________________________________________
>>> CF-metadata mailing list
>>> CF-metadata at cgd.ucar.edu
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
>>
>> --------------
>> I have my new work email address: jgraybeal at ucsd.edu
>> --------------
>>
>> John Graybeal <mailto:jgraybeal at ucsd.edu>
>> phone: 858-534-2162
>> Development Manager
>> Ocean Observatories Initiative Cyberinfrastructure Project:
>> http://ci.oceanobservatories.org
>> Marine Metadata Interoperability Project: http://marinemetadata.org
>>
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>

-- 
V. Balaji                               Office:  +1-609-452-6516
Head, Modeling Systems Group, GFDL      Home:    +1-212-253-6662
Princeton University                    Email: v.balaji at noaa.gov
Received on Fri Nov 20 2009 - 06:51:15 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒