> If you have hdf5 files that should be readable, then I will undertake to
> look at them and see what the problem is.
ok, thank you
> WRT to old files: We could produce a utility that would redef the file
> and insert the
> _NCProperties attribute. This would allow someone to wholesale
> mark old files.
Excellent idea , Dennis
----------------------
Pedro Vicente
pedro.vicente at space-research.org
https://twitter.com/_pedro__vicente
http://www.space-research.org/
----- Original Message -----
From: <dmh at ucar.edu>
To: "Pedro Vicente" <pedro.vicente at space-research.org>;
<cf-metadata at cgd.ucar.edu>; "Discussion forum for the NeXus data format"
<nexus at nexusformat.org>; <netcdfgroup at unidata.ucar.edu>
Sent: Thursday, April 21, 2016 5:02 PM
Subject: Re: [netcdfgroup] [Hdf-forum] Detecting netCDF versus HDF5 --
PROPOSED SOLUTIONS --REQUEST FOR COMMENTS
> If you have hdf5 files that should be readable, then I will undertake to
> look at them and see what the problem is.
> WRT to old files: We could produce a utility that would redef the file
> and insert the
> _NCProperties attribute. This would allow someone to wholesale
> mark old files.
> =Dennis Heimbigner
> Unidata
>
> On 4/21/2016 2:17 PM, Pedro Vicente wrote:
>> Dennis
>>
>>>>>> I am in the process of adding a global attribute in the root group
>>> that captures both the netcdf library version and the hdf5 library
>>> version
>>> whenever a netcdf file is created. The current form is
>>> _NCProperties="version=...|netcdflibversion=...|hdflibversion=..."
>>
>>
>> ok, good to know, thank you
>>
>>
>>>>> > 1. I am open to suggestions about changing the format or adding
>>>>> info > to it.
>>
>>
>> I personally don't care, anything that uniquely identifies a netCDF file
>> (HDF5 based) as such will work
>>
>>
>>>>> 2. Of course this attribute will not exist in files written using
>>>>> older
>>> versions of the netcdf library, but at least the process will have
>>> begun.
>>
>> yes
>>
>>
>>> 3. This technically does not address the original issue because there
>>> exist
>>> hdf5 files not written by netcdf that are still compatible with
>>> and can be
>>> read by netcdf. Not sure this case is important or not.
>>
>> there will always be HDF5 files not written by netcdf that netCDF will
>> read as we are now.
>>
>> this is not really the issue, but you just made a further issue :-)
>>
>> the issue is that I would like an application that reads a netCDF (HDF5
>> based) file to decide to use the netCDF or HDF5 API.
>> your attribute writing will do , for future files.
>> for older nertCDF files there may be a way to detect the current
>> attributes and data structures to see if we can make it "identify itself"
>> as netCDF. A bit of debugging will confirm that, since Dimension Scales
>> are used, that would be an (imperfect maybe) way to do it
>>
>> regarding the "further issue " above
>>
>> you could go one step further and for any HDF5 files not written by
>> netcdf , you could make netCDF reject the file reading,
>> because it's not "netCDF compliant".
>> Since having netCDF read pure HDF5 files is not a problem (at least for
>> me), I don't know if you would want to do this, just an idea.
>> In my mind taking complexity and ambiguities of problems is always a good
>> thing
>>
>>
>> ah, I forgot one thing, related to this
>>
>>
>> In the past I have found several pure HDF5 files that netCDF failed in
>> reading.
>> Since netCDF is HDF5 binary compatible, one would expect that all HDF5
>> files will be read by netCDF.
>> Except if you specifically wrote something in the code that makes it to
>> fail if some condition is not met,
>> This was a while ago, I'll try to find those cases and I'll send a bug
>> report to the bug report email
>>
>> ----------------------
>> Pedro Vicente
>> pedro.vicente at space-research.org
>> https://twitter.com/_pedro__vicente
>> http://www.space-research.org/
>>
>> ----- Original Message ----- From: <dmh at ucar.edu>
>> To: "Pedro Vicente" <pedro.vicente at space-research.org>; "HDF Users
>> Discussion List" <hdf-forum at lists.hdfgroup.org>;
>> <cf-metadata at cgd.ucar.edu>; "Discussion forum for the NeXus data format"
>> <nexus at nexusformat.org>; <netcdfgroup at unidata.ucar.edu>
>> Cc: "John Shalf" <jshalf at lbl.gov>; <Richard.E.Ullman at nasa.gov>;
>> "Marinelli, Daniel J. (GSFC-5810)" <daniel.j.marinelli at nasa.gov>;
>> "Miller, Mark C." <miller86 at llnl.gov>
>> Sent: Thursday, April 21, 2016 2:30 PM
>> Subject: Re: [netcdfgroup] [Hdf-forum] Detecting netCDF versus HDF5 --
>> PROPOSED SOLUTIONS --REQUEST FOR COMMENTS
>>
>>
>>> I am in the process of adding a global attribute in the root group
>>> that captures both the netcdf library version and the hdf5 library
>>> version
>>> whenever a netcdf file is created. The current form is
>>> _NCProperties="version=...|netcdflibversion=...|hdflibversion=..."
>>> Where version is the version of the _NCProperties attribute and the
>>> others
>>> are e.g. 1.8.18 or 4.4.1-rc1.
>>> Issues:
>>> 1. I am open to suggestions about changing the format or adding info to
>>> it.
>>> 2. Of course this attribute will not exist in files written using older
>>> versions
>>> of the netcdf library, but at least the process will have begun.
>>> 3. This technically does not address the original issue because there
>>> exist
>>> hdf5 files not written by netcdf that are still compatible with
>>> and can be
>>> read by netcdf. Not sure this case is important or not.
>>> =Dennis Heimbigner
>>> Unidata
>>>
>>>
>>> On 4/21/2016 9:33 AM, Pedro Vicente wrote:
>>>> DETECTING HDF5 VERSUS NETCDF GENERATED FILES
>>>> REQUEST FOR COMMENTS
>>>> AUTHOR: Pedro Vicente
>>>>
>>>> AUDIENCE:
>>>> 1) HDF, netcdf developers,
>>>> Ed Hartnett
>>>> Kent Yang
>>>> 2) HDF, netcdf users, that replied to this thread
>>>> Miller, Mark C.
>>>> John Shalf
>>>> 3 ) netcdf tools developers
>>>> Mary Haley , NCL
>>>> 4) HDF, netcdf managers and sponsors
>>>> David Pearah , CEO HDF Group
>>>> Ward Fisher, UCAR
>>>> Marinelli, Daniel J. , Richard Ullmman, Christopher Lynnes, NASA
>>>> 5)
>>>> [CF-metadata] list
>>>> After this thread started 2 months ago, there was an annoucement on the
>>>> [CF-metadata] mail list
>>>> about
>>>> "a meeting to discuss current and future netCDF-CF efforts and
>>>> directions.
>>>> The meeting will be held on 24-26 May 2016 in Boulder, CO, USA at the
>>>> UCAR Center Green facility."
>>>> This would be a good topic to put on the agenda, maybe?
>>>> THE PROBLEM:
>>>> Currently it is impossible to detect if an HDF5 file was generated by
>>>> the HDF5 API or by the netCDF API.
>>>> See previous email about the reasons why.
>>>> WHY THIS MATTERS:
>>>> Software applications that need to handle both netCDF and HDF5 files
>>>> cannot decide which API to use.
>>>> This includes popular visualization tools like IDL, Matlab, NCL, HDF
>>>> Explorer.
>>>> SOLUTIONS PROPOSED: 2
>>>> SOLUTION 1: Add a flag to HDF5 source
>>>> The hdf5 format specification, listed here
>>>> https://www.hdfgroup.org/HDF5/doc/H5.format.html
>>>> describes a sequence of bytes in the file layout that have special
>>>> meaning for the HDF5 API. It is common practice, when designing a data
>>>> format,
>>>> so leave some fields "reserved for future use".
>>>> This solution makes use of one of these empty "reserved for future
>>>> use" spaces to save a byte (for example) that describes an enumerator
>>>> of "HDF5 compatible formats".
>>>> An "HDF5 compatible format" is a data format that uses the HDF5 API at
>>>> a lower level (usually hidden from the user of the upper API),
>>>> and providing its own API.
>>>> This category can still be divide in 2 formats:
>>>> 1) A "pure HDF5 compatible format". Example, NeXus
>>>> http://www.nexusformat.org/
>>>> NeXus just writes some metadata (attributes) on top of the HDF5 API,
>>>> that has some special meaning for the NeXus community
>>>> 2) A "non pure HDF5 compatible format". Example, netCDF
>>>> Here, the format adds some extra feature besides HDF5. In the case of
>>>> netCDF, these are shared dimensions between variables.
>>>> This sub-division between 1) and 2) is irrelevant for the problem and
>>>> solution in question
>>>> The solution consists of writing a different enumerator value on the
>>>> "reserved for future use" space. For example
>>>> Value decimal 0 (current value): This file was generated by the HDF5
>>>> API (meaning the HDF5 only API)
>>>> Value decimal 1: This file was generated by the netCDF API (using HDF5)
>>>> Value decimal 2: This file was generated by <put here another HDF5
>>>> based format>
>>>> and so on
>>>> The advantage of this solution is that this process involves 2 parties:
>>>> the HDF Group and the other format's organization.
>>>> This allows the HDF Group to "keep track" of new HDF5 based formats. It
>>>> allows to make the other format "HDF5 certified" .
>>>> SOLUTION 2: Add some metadata to the other API on top of HDF5
>>>> This is what Nexus uses.
>>>> A Nexus file on creation writes several attributes on the root group,
>>>> like "NeXus_version" and other numeric data.
>>>> This is done using the public HDF5 API calls.
>>>> The solution for netCDF consists of the same approach, just write some
>>>> specific attributes, and a special netCDF API to write/read them.
>>>> This solutions just requires the work of one party (the netCDF group)
>>>> END OF RFC
>>>> In reply to people that commented in the thread
>>>> _at_John Shalf
>>>> >>Perhaps NetCDF (and other higher-level APIs that are built on top of
>>>> HDF5) should include an attribute attached
>>>> >>to the root group that identifies the name and version of the API
>>>> that created the file? (adopt this as a convention)
>>>> yes, that's one way to do it, Solution 2 above
>>>> _at_Mark Miller
>>>> >>>Hmmm. Is there any big reason NOT to try to read a netCDF produced
>>>> HDF5 file with the native HDF5 library if someone so chooses?
>>>> It's possible to read a netCDF file using HDF5, yes.
>>>> There are 2 things that you will miss doing this:
>>>> 1) the ability to inquire about shared netCDF dimensions.
>>>> 2) the ability to read remotely with openDAP.
>>>> Reading with HDF5 also exposes metadata that is supposed to be private
>>>> to netCDF. See below
>>>> >>>> And, attempting to read an HDF5 file produced by Silo using just
>>>> the HDF5 library (e.g. w/o Silo) is a major pain.
>>>> This I don't understand. Why not read the Silo file with the Silo API?
>>>> That's the all purpose of this issue, each higher level API on top of
>>>> HDF5 should be able to detect "itself".
>>>> I am not familiar with Silo, but if Silo cannot do this, then you have
>>>> the same design flaw that netCDF has.
>>>>
>>>> >>> In a cursory look over the libsrc4 sources in netCDF distro, I see
>>>> a few things that might give a hint a file was created with netCDF. . .
>>>> >>>> First, in NC_CLASSIC_MODEL, an attribute gets attached to the
>>>> root group named "_nc3_strict". So, the existence of an attribute on
>>>> the root group by that name would suggest the HDF5 file was generated
>>>> by netCDF.
>>>> I think this is done only by the "old" netCDF3 format.
>>>> >>>>> Also, I tested a simple case of nc_open, nc_def_dim, etc.
>>>> nc_close to see what it produced.
>>>> >>>> It appears to produce datasets for each 'dimension' defined with
>>>> two attributes named "CLASS" and "NAME".
>>>> This is because netCDF uses the HDF5 Dimension Scales API internally to
>>>> keep track of shared dimensions. These are internal attributes
>>>> of Dimension Scales. This approach would not work because an HDF5 only
>>>> file with Dimension Scales would have the same attributes.
>>>>
>>>> >>>> I like John's suggestion here.
>>>> >>>>>But, any code you add to any applications now will work *only*
>>>> for files that were produced post-adoption of this convention.
>>>> yes. there are 2 actions to take here.
>>>> 1) fix the issue for the future
>>>> 2) try to retroactively have some workaround that makes possible now to
>>>> differentiate a HDF5/netCDF files made before the adopted convention
>>>> see below
>>>>
>>>> >>>> In VisIt, we support >140 format readers. Over 20 of those are
>>>> different variants of HDF5 files (H5part, Xdmf, Pixie, Silo, Samrai,
>>>> netCDF, Flash, Enzo, Chombo, etc., etc.)
>>>> >>>>When opening a file, how does VisIt figure out which plugin to
>>>> use? In particular, how do we avoid one poorly written reader plugin
>>>> (which may be the wrong one for a given file) from preventing the
>>>> correct one from being found. Its kinda a hard problem.
>>>>
>>>> Yes, that's the problem we are trying to solve. I have to say, that is
>>>> quick a list of HDF5 based formats there.
>>>> >>>> Some of our discussion is captured here. . .
>>>> http://www.visitusers.org/index.php?title=Database_Format_Detection
>>>> I"ll check it out, thank you for the suggestions
>>>> _at_Ed Hartnett
>>>> >>>I must admit that when putting netCDF-4 together I never considered
>>>> that someone might want to tell the difference between a "native" HDF5
>>>> file and a netCDF-4/HDF5 file.
>>>> >>>>>Well, you can't think of everything.
>>>> This is a major design flaw.
>>>> If you are in the business of designing data file formats, one of the
>>>> things you have to do is how to make it possible to identify it from
>>>> the other formats.
>>>>
>>>> >>> I agree that it is not possible to canonically tell the
>>>> difference. The netCDF-4 API does use some special attributes to track
>>>> named dimensions,
>>>> >>>>and to tell whether classic mode should be enforced. But it can
>>>> easily produce files without any named dimensions, etc.
>>>> >>>So I don't think there is any easy way to tell.
>>>> I remember you wrote that code together with Kent Yang from the HDF
>>>> Group.
>>>> At the time I was with the HDF Group but unfortunately I did follow
>>>> closely what you were doing.
>>>> I don't remember any design document being circulated that explains the
>>>> internals of the "how to" make the netCDF (classic) model of shared
>>>> dimensions
>>>> use the hierarchical group model of HDF5.
>>>> I know this was done using the HDF5 Dimension Scales (that I wrote),
>>>> but is there any design document that explains it?
>>>> Maybe just some internal email exchange between you and Kent Yang?
>>>> Kent, how are you?
>>>> Do you remember having any design document that explains this?
>>>> Maybe something like a unique private attribute that is written
>>>> somewhere in the netCDF file?
>>>>
>>>> _at_Mary Haley, NCL
>>>> NCL is a widely used tool that handles both netCDF and HDF5
>>>> Mary, how are you?
>>>> How does NCL deal with the case of reading both pure HDF5 files and
>>>> netCDF files that use HDF5?
>>>> Would you be interested in joining a community based effort to deal
>>>> with this, in case this is an issue for you?
>>>>
>>>> _at_David Pearah , CEO HDF Group
>>>> I volunteer to participate in the effort of this RFC together with the
>>>> HDF Group (and netCDF Group).
>>>> Maybe we could make a "task force" between HDF Group, netCDF Group and
>>>> any volunteer (such as tools developers that happen to be in these mail
>>>> lists)?
>>>> The "task force" would have 2 tasks:
>>>> 1) make a HDF5 based convention for the future and
>>>> 2) try to retroactively salvage the current design issue of netCDF
>>>> My phone is 217-898-9356, you are welcome to call in anytime.
>>>> ----------------------
>>>> Pedro Vicente
>>>> pedro.vicente at space-research.org
>>>> <mailto:pedro.vicente at space-research.org>
>>>> https://twitter.com/_pedro__vicente
>>>> http://www.space-research.org/
>>>>
>>>> ----- Original Message -----
>>>> *From:* Miller, Mark C. <mailto:miller86 at llnl.gov>
>>>> *To:* HDF Users Discussion List
>>>> <mailto:hdf-forum at lists.hdfgroup.org>
>>>> *Cc:* netcdfgroup at unidata.ucar.edu
>>>> <mailto:netcdfgroup at unidata.ucar.edu> ; Ward Fisher
>>>> <mailto:wfisher at ucar.edu>
>>>> *Sent:* Wednesday, March 02, 2016 7:07 PM
>>>> *Subject:* Re: [Hdf-forum] Detecting netCDF versus HDF5
>>>>
>>>> I like John's suggestion here.
>>>>
>>>> But, any code you add to any applications now will work *only* for
>>>> files that were produced post-adoption of this convention.
>>>>
>>>> There are probably a bazillion files out there at this point that
>>>> don't follow that convention and you probably still want your
>>>> applications to be able to read them.
>>>>
>>>> In VisIt, we support >140 format readers. Over 20 of those are
>>>> different variants of HDF5 files (H5part, Xdmf, Pixie, Silo,
>>>> Samrai, netCDF, Flash, Enzo, Chombo, etc., etc.) When opening a
>>>> file, how does VisIt figure out which plugin to use? In
>>>> particular, how do we avoid one poorly written reader plugin
>>>> (which may be the wrong one for a given file) from preventing the
>>>> correct one from being found. Its kinda a hard problem.
>>>>
>>>> Some of our discussion is captured here. . .
>>>>
>>>> http://www.visitusers.org/index.php?title=Database_Format_Detection
>>>>
>>>> Mark
>>>>
>>>>
>>>> From: Hdf-forum <hdf-forum-bounces at lists.hdfgroup.org
>>>> <mailto:hdf-forum-bounces at lists.hdfgroup.org>> on behalf of John
>>>> Shalf <jshalf at lbl.gov <mailto:jshalf at lbl.gov>>
>>>> Reply-To: HDF Users Discussion List <hdf-forum at lists.hdfgroup.org
>>>> <mailto:hdf-forum at lists.hdfgroup.org>>
>>>> Date: Wednesday, March 2, 2016 1:02 PM
>>>> To: HDF Users Discussion List <hdf-forum at lists.hdfgroup.org
>>>> <mailto:hdf-forum at lists.hdfgroup.org>>
>>>> Cc: "netcdfgroup at unidata.ucar.edu
>>>> <mailto:netcdfgroup at unidata.ucar.edu>"
>>>> <netcdfgroup at unidata.ucar.edu
>>>> <mailto:netcdfgroup at unidata.ucar.edu>>, Ward Fisher
>>>> <wfisher at ucar.edu <mailto:wfisher at ucar.edu>>
>>>> Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5
>>>>
>>>> Perhaps NetCDF (and other higher-level APIs that are built on
>>>> top of HDF5) should include an attribute attached to the root
>>>> group that identifies the name and version of the API that
>>>> created the file? (adopt this as a convention)
>>>>
>>>> -john
>>>>
>>>> On Mar 2, 2016, at 12:55 PM, Pedro Vicente
>>>> <pedro.vicente at space-research.org
>>>> <mailto:pedro.vicente at space-research.org>> wrote:
>>>> Hi Ward
>>>> As you know, Data Explorer is going to be a general
>>>> purpose data reader for many formats, including HDF5 and
>>>> netCDF.
>>>> Here
>>>> http://www.space-research.org/
>>>> Regarding the handling of both HDF5 and netCDF, it seems
>>>> there is a potential issue, which is, how to tell if any
>>>> HDF5 file was saved by the HDF5 API or by the netCDF API?
>>>> It seems to me that this is not possible. Is this correct?
>>>> netCDF uses an internal function NC_check_file_type to
>>>> examine the first few bytes of a file, and for example for
>>>> any HDF5 file the test is
>>>> /* Look at the magic number */
>>>> /* Ignore the first byte for HDF */
>>>> if(magic[1] == 'H' && magic[2] == 'D' && magic[3] ==
>>>> 'F') {
>>>> *filetype = FT_HDF;
>>>> *version = 5;
>>>> The problem is that this test works for any HDF5 file and
>>>> for any netCDF file, which makes it impossible to tell
>>>> which is which.
>>>> Which makes it impossible for any general purpose data
>>>> reader to decide to use the netCDF API or the HDF5 API.
>>>> I have a possible solution for this , but before going any
>>>> further, I would just like to confirm that
>>>> 1) Is indeed not possible
>>>> 2) See if you have a solid workaround for this,
>>>> excluding the dumb ones, for example deciding on a
>>>> extension .nc or .h5, or traversing the HDF5 file to see
>>>> if it's non netCDF conforming one. Yes, to further
>>>> complicate things, it is possible that the above test says
>>>> OK for a HDF5 file, but then the read by the netCDF API
>>>> fails because the file is a HDF5 non netCDF conformant
>>>> Thanks
>>>> ----------------------
>>>> Pedro Vicente
>>>> pedro.vicente at space-research.org
>>>> <mailto:pedro.vicente at space-research.org>
>>>> http://www.space-research.org/
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> Hdf-forum at lists.hdfgroup.org
>>>> <mailto:Hdf-forum at lists.hdfgroup.org>
>>>>
>>>> http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
>>>> Twitter: https://twitter.com/hdf5
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> Hdf-forum at lists.hdfgroup.org
>>>> <mailto:Hdf-forum at lists.hdfgroup.org>
>>>>
>>>> http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
>>>> Twitter: https://twitter.com/hdf5
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> Hdf-forum at lists.hdfgroup.org
>>>>
>>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>>> Twitter: https://twitter.com/hdf5
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> netcdfgroup mailing list
>>>> netcdfgroup at unidata.ucar.edu
>>>> For list information or to unsubscribe, visit:
>>>> http://www.unidata.ucar.edu/mailing_lists/
>>>
>>
>
Received on Thu Apr 21 2016 - 16:18:45 BST