⇐ ⇒

[CF-metadata] Swath observational data

From: tjn98 <tim.nightingale>
Date: Fri, 20 Nov 2009 14:30:46 +0000

Dear All,

I think there may be two distinct cases here:

1) Local cross-referencing, where it is only necessary to establish
   a relationship within a well-defined grouping of files,

2) Referencing to a universal resource, such as a specific file
   held on a server.

For the former, it should only be necessary that every NetCDF file
within the grouping holds the same unique identifier (this could
be the product or group name, or an ID string from a managed soure).
Satellite swath products, where they have this sort of structure,
almost always fall in the first category. In general, a user would
want to use his or her local copy of a file, rather than re-download
a remote file.

This may be redundant by now, but my thoughts were that:

1) We only consider whether we can extend cross-referencing within
   a local scope,

2) All related files within the scope should contain the same unique
   identifier, perhaps a global attribute named something like
   ?cross_reference_ID?.

3) Referenced variable names within the scope should be unique and so
   do not need modifiers. An alternative is that modifiers are not
   needed in references by default, but could be included to
   disambiguate variables - perhaps in a form like ?geo:latitude?
   where geo.nc is the file containing the required latitude variable.

If the attribute contains an empty string or is absent, CF compliant
systems only look for referenced variables within the same file, as
at the moment. If present, the system is allowed to search other files
within a limited scope, containing the same ID.

One possibility is that scope could be modified with, perhaps,
unix-like relative directory prefixes to the ID, so that

  :cross_reference_ID = ?my_unique_id?;

refers just to to files in the same directory, whereas

  :cross_reference_ID = ?../*/my_unique_id?;

refers to all files held under the parent directory and its
subdirectories, and so on.

If the purpose of the ID is only to disambiguate local files, then
form and integrity of the ID string itself could probably be left
to the discretion of the data provider, since it would only need to
be checked within a defined scope. More rigorous implementations
are a bit beyond my experience.

Anybody who?s interested can find the SAFE format definition at
earth.esa.int/SAFE. You should probably enjoy UML diagrams to
appreciate it fully. Note that the format doesn?t discuss NetCDF
in particular ? this is just the format that Sentinel-3 is adopting
for its data containers.

  Tim.




On 20/11/2009 06:23, "Bryan Lawrence" <bryan.lawrence at stfc.ac.uk> wrote:

> On Thursday 19 November 2009 19:40:08 Jonathan Gregory wrote:
>>> > > ... In some cases, referencing attributes such as
>>> > > "coordinates" and "ancillary_variables" would, ideally, point to a
>>> > > variable in a different dataset.
>> >
>> > This is a general problem to which CF doesn't have a solution because it >>
was
>> > conceived as a convention for single netCDF files. However we need a
>> solution
>> > as often several files should be treated as a single dataset.
>> >
>> > If the files don't overlap i.e. their contents are complementary, I think
>> it
>> > should be satisfactory to allow variables in one file to be pointed to by
>> name
>> > from another file, with no other mechanism being required within the file.
I
>> > don't like the idea of naming one file within another file, as that would
>> be
>> > very fragile. Instead, I think the file aggregation should be implied by
>> > simply defining the group of files which are to be treated as one file e.g.
>> > by putting them in one directory.
>
> It's the old ones that are the best ones :-) :-) this issue keeps on coming
> back ... :-) :-) and we keep trying to ignore it ...
>
> I think we agree that an actual physical filename including path is useless.
> We need both a relative link which relies on the preservation of a group of
> files in a particular arrangement ... AND an internal identifier so more
> robust linking mechanisms can be used when (if) the data ends up in a managed
> environment.
>
> I think it's crucial in this situation to ensure that each file has a unique
> identifier within it (created, for example, with uuid), because all solutions
> which rely on packaging are fragile (SAFE is probably better than most), but
> the bottom line is that users move files around ... and we need some way of
> ensuring that we/they can validate the links that are in place are the ones
> that were originally intended.
>
> So relative links would also include the identifier of the intended target as
> well as the relative path in operating system agnostic terms.
>
> That identifier can be used in two ways: to validate the link (my software can
> always check that the variable that I just opened following a link from
> another one is the one that was expected by checking the container
> identifier), and b) to produce an identifier resolver service for the
> situation where the packaging has had to be broken (which might occur for
> performance reasons or ...)
>
> CF could recommend something like this ...
>
> Bryan


-- 
Scanned by iCritical.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20091120/2f35b815/attachment-0002.html>
Received on Fri Nov 20 2009 - 07:30:46 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒