[CF-metadata] Multiple file datasets (was: Swath observational data)

From: Stephen Emsley <SEmsley> Date: Fri, 20 Nov 2009 13:35:48 -0000 · This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

---
Dr Stephen Emsley??????? ???????????????????????????????????????????????????????????Tel: +44 (0)1752 764 289
? ARGANS Limited????????????????????????????????????????????????????? ????????Mobile: +44 (0)7912 515 418
-----Original Message-----
From: cf-metadata-bounces at cgd.ucar.edu [mailto:cf-metadata-bounces at cgd.ucar.edu] On Behalf Of John Caron
Sent: 20 November 2009 12:30
To: cf-metadata at cgd.ucar.edu
Subject: [CF-metadata] Multiple file datasets (was: Swath observational data)
This topic deserves its own heading, so here it is.
Perhaps we should gather current practices and ideas. I think Balaji's gridspec has a proposal about this. Can anyone summarize what SAFE does?
Im imagining how this is actually used, eg:
float data(y,x);
  data:coordinates = "lat at file1 lon at file2";
????
John Graybeal wrote:
> I like Bryan's recommendation for a UUID or similar.
> 
> Now I'm going to be annoying and suggest the UUID *could* be a URI, or 
> these days, an IRI (International ..).
> 
> And I think the way of 'locating' the file should be neither in 
> packaging nor in local resolution; it should be in global namespace 
> resolution.  This is the way of the future, and is already more 
> 'permanent' than either packaging or local resolution, IMHO.
> 
> There is one form of URI in particular that is already resolvable: a 
> URL.  OK, that's an old song, but I'm gonna stick to it for a while 
> longer.  That form meets all the other requirements: it can be 
> registered in a resolver, it can be guaranteed unique (to the same 
> authority level as a UUID, anyway), and it is a unique string that can 
> be used to validate the link).  And it has the obvious benefit of being 
> resolvable right now, for as long as the domain is held and properly 
> maintained (Good URLs don't die).
> 
> Since the last paragraph risks starting another unique identifier war, I 
> promise not to re-engage unless someone asks me to. Meanwhile, I like
> 
> John
> 
> 
> On Nov 19, 2009, at 22:23, Bryan Lawrence wrote:
> 
>> On Thursday 19 November 2009 19:40:08 Jonathan Gregory wrote:
>>>>     ...  In  some cases, referencing attributes such as
>>>>      "coordinates" and "ancillary_variables" would, ideally, point to a
>>>>      variable in a different dataset.
>>>
>>> This is a general problem to which CF doesn't have a solution because 
>>> it was
>>> conceived as a convention for single netCDF files. However we need a 
>>> solution
>>> as often several files should be treated as a single dataset.
>>>
>>> If the files don't overlap i.e. their contents are complementary, I 
>>> think it
>>> should be satisfactory to allow variables in one file to be pointed 
>>> to by name
>>> from another file, with no other mechanism being required within the 
>>> file. I
>>> don't like the idea of naming one file within another file, as that 
>>> would be
>>> very fragile. Instead, I think the file aggregation should be implied by
>>> simply defining the group of files which are to be treated as one 
>>> file e.g.
>>> by putting them in one directory.
>>
>> It's the old ones that are the best ones :-) :-)  this issue keeps on 
>> coming back ... :-) :-) and we keep trying to ignore it ...
>>
>> I think we agree that an actual physical filename including path is 
>> useless. We need both  a relative link which relies on the 
>> preservation of a group of files in a particular arrangement ...  AND 
>> an internal identifier so more robust linking mechanisms can be used 
>> when (if) the data ends up in a managed environment.
>>
>> I think it's crucial in this situation to ensure that each file has a 
>> unique identifier within it (created, for example, with uuid), because 
>> all solutions which rely on packaging are fragile (SAFE is probably 
>> better than most), but the bottom line is that users move files around 
>> ... and we need some way of ensuring that we/they can validate the 
>> links that are in place are the ones that were originally intended.
>>
>> So relative links would also include the identifier of the intended 
>> target as well as the relative path in operating system agnostic terms.
>>
>> That identifier can be used in two ways: to validate the link (my 
>> software can always check that the variable that I just opened 
>> following a link from another one is the one that was expected by 
>> checking the container identifier), and b) to produce an identifier 
>> resolver service for the situation where the packaging has had to be 
>> broken (which might occur for performance reasons or ...)
>>
>> CF could recommend something like this ...
>>
>> Bryan
>>
>> -- 
>> Bryan Lawrence
>> Director of Environmental Archival and Associated Research
>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
>> STFC, Rutherford Appleton Laboratory
>> Phone +44 1235 445012; Fax ... 5848;
>> Web: home.badc.rl.ac.uk/lawrence
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> 
> 
> --------------
> I have my new work email address: jgraybeal at ucsd.edu
> --------------
> 
> John Graybeal   <mailto:jgraybeal at ucsd.edu>
> phone: 858-534-2162
> Development Manager
> Ocean Observatories Initiative Cyberinfrastructure Project: 
> http://ci.oceanobservatories.org
> Marine Metadata Interoperability Project: http://marinemetadata.org
> 
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata