Discovery metadata, as Ken pointed out, allows automated population of catalogs. The importance of catalogs has not caught on yet. I see data analysis as consisting of these steps among others:- 1) Search for required data 2) Download the data 3) Analyze the data.
NetCDF, CF-conventions and THREDDS/OpenDAP together allow machine readability of the data without having the data locally. The data could be local or remote, they can now be treated the same way. The download step is automated. But the user still has to know the remote location of the dataset. The search step is still there. With catalogs, searching is automated too. This is only the beginning. The catalogs could hold not just data metadata, but also metadata for processing services. This means that one could potentially combine seach and download steps with processing step. If I have a service to find average of values within a rectangular array of data, instead of asking the query- "Find all the GHRSST data corresponding to this variable for this region and this time period created by so and so.", one could potentially do much more powerful searches like - "Find all the GHRSST data for this region and this time period such that the average value is greater than a given val
ue." So now, simple seach has become search plus processing. To go even further, one could create complete computatinal models using data and processing service metadata present in the catalog.
Upendra
-------------- next part --------------
Ken, Craig,
re: ACDD and CF
You are right, Craig, to phrase your question in terms of "guidance on
the future aspirations of CF". This topic deserves a more focussed
discussion within CF. Shooting from the hip, I'd be inclined to offer
these comments (recognizing that there is disagreement among individuals
who have discussed this):
1. Many of the ACDD attributes (history, date_created, creator_name,
...) are largely non-controversial.
2. CF generally favors attributes attached to variables over
attributes attached to files, as it reduces the potential for
conflicts. Conflicts from subsetting: What happens if you
extract a single variable from a file to make a new file?
Conflicts from editing: Suppose only a single variable in a CF
file is altered.
3. Some of the ACDD discovery attributes are redundant with respect
to information already in the CF metadata, but is encoded by other
means. For example, the ACDD geospatial_lon_min/max can be
inferred from the CF coordinate system information. Redundant
information only becomes a problem through its potential to lead
to corruption. Example, a conflict arises with the global
attributes time_coverage_start
<
http://www.unidata.ucar.edu/software/netcdf-java/formats/DataDiscoveryAttConvention.html#time_coverage_start_Attribute>/end
when files are aggregated in time by ncML (a very common situation).
My own personal slant on this debate is that much of the ACDD content
would better placed in THREDDS metadata, than in the file itself.
THREDDS servers, such as TDS and HYRAX, could "intelligently" generate
many ACDD attributes based upon contents in the file. This approach
would eliminate many of the potential issues of redundancy and
conflicting information in a file. Full use of ACDD as global
attributes tends to lock us into maintaining the integrity of a "file".
If you believe that the future growth direction of netCDF and CF lies in
subsetting and aggregation capabilities (as I do) then the ACDD runs the
risk of painting you into some corners where you rather not be.
At this point, the best (though still lame!) advice would seem to be to
use the attributes thoughtfully, rather than carelessly.
- Steve
==========================================
Kenneth Casey wrote:
> Craig - to be absolutely clear: the ACDD attributes in no way
> conflict with CF. They just provide some recommendations on what
> names to use for some attributes. Using a common set of attribute
> names enables us to build tools around those attributes that work well
> across different data sets. Within NOAA for example there is a
> project called the Unified Access Framework that has linked together
> dozens of disparate THREDDS Data Servers through a single THREDDS
> catalog. The larger number of data sets in that catalog that use the
> ACDD the easier it is to build and maintain a dynamic crawler to
> update that catalog on a regular interval. Also, it becomes possible
> to extract automatically ISO "discovery level" metadata and feed it
> into standard search mechanisms thereby making it possible to find
> what you want amidst that sea of information. Other groups have built
> tools to automatically crawl these attributes to assess the data in
> terms of it's metadata robustness. That knowledge is useful for a
> variety of purposes.
>
> I will be interested to hear what folks on this list have to say about
> CF "taking up" the ACDD recommendations. That might be fine but I am
> not sure it is necessary. ACDD is focused purely on improving
> discovery. CF focuses on other things like usability and
> understanding, at least as far as I understand it.
>
> Ken
>
> --
> Kenneth S. Casey, Ph.D.
> Technical Director
> NOAA National Oceanographic Data Center
> 1315 East-West Highway
> Silver Spring MD 20910
> 301-713-3272 x133
> http://www.nodc.noaa.gov
>
> On Jul 10, 2010, at 5:38 AM, Craig Donlon <craig.donlon at esa.int
> <mailto:craig.donlon at esa.int>> wrote:
>
>> Dear all:
>> CF is quite light on global metadata and metadata suitable for data
>> discovery and interoperability. Within the Group for High Resolution
>> Sea Surface Temperature (GHRSST, see http://www.ghrsst.org) we are
>> updating our product technical specifications (GDS) documentation.
>> We want to provide more flexibility and interoperability with our
>> products in a 'future proof' manner. GHRSST is handling 25Gb data
>> per day in an international context with many thousands of files in
>> NRT.
>>
>> Our latest specs. have included the NetCDF Attribute Convention for
>> Dataset Discovery
>> (ACDD http://www.unidata.ucar.edu/software/netcdf-java/formats/DataDiscoveryAttConvention.html)
>> and this has raised some questions about our CF compliance. I
>> realise that CF allows extensions, but what I am asking for is some
>> guidance on the future aspirations of CF for discovery metadata. I
>> like the ACDD recommendations and Ideally, I would like to be able to
>> write in our GHRSST data products that we are fully CF compliant.
>> Does the CF community anticipate taking up the
>> ACDD recommendations in the near future? What are peoples thoughts on
>> CF and improved metadata discovery?
>>
>> I look forward to your comments and advice,
>>
>> Best regards
>> Craig Donlon (Chair of the GHRSST International Science Team)
>>
>> --
>> Dr Craig Donlon
>> Principal Scientist for Oceans and Ice
>> ESA/ESTEC (EOP-SME)
>> Keplerlaan 1, 2201 AZ
>> Noordwijk The Netherlands
>>
>> t: +31 (0)715 653687
>> f: +31 (0)715 655675
>> e: craig.donlon at esa.int <mailto:craig.donlon at esa.int>
>> m:+31 (0)627 013244 (*new*)
>> Skype ID:crazit
>> altE-mail: craig.donlon at gmail.com <mailto:craig.donlon at gmail.com>
> ------------------------------------------------------------------------
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
--
Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
7600 Sand Point Way NE, Seattle, WA 98115-0070
ph. (206) 526-6080, FAX (206) 526-6744
"The only thing necessary for the triumph of evil is for good men
to do nothing." -- Edmund Burke
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20100714/c747955b/attachment-0002.html>
-------------- next part --------------
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Wed Jul 14 2010 - 07:36:04 BST