Opened 4 years ago

Last modified 3 years ago

#166 new enhancement

Add new integer types to CF

Reported by: zender Owned by: cf-conventions@…
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: integer unsigned CDF5 Cc: zender@…

Description (last modified by zender)

This ticket would merge support for new integer types into CF 1.8. CF explicitly supports types char, byte, short, int, float, and double. The five "new" integer types it could support are: unsigned byte, unsigned short, unsigned int, int64, and unsigned int64. These new types are in netCDF3 (in the CDF5 encoding released in netCDF v. 4.4.0) and in netCDF4.

About 10 people deliberated on the CF list how best to do this. The original draft proposal included wording to clarify the treatment of unsigned integers for backward compatibility with CDF1 and CDF2, and to clarity that and which integer types are interchangable. It seems that these last two issues will require separate tickets, since their implementation received more unresolved debate. The addition of new types itself was not controversial and achieved consensus. Thus this shortened proposal, suggested by Dave Allured, that includes new sub-sub-sections intended to aid the discussion of the unresolved issues mentioned above.

The current CF 1.8 draft reads (Section 2.2):

2.2. Data Types

The netCDF data types char, byte, short, int, float or real, and double are
all acceptable. The char type is not intended for numeric data. One byte
numeric data should be stored using the byte data type. All integer types
are treated by the netCDF interface as signed. It is possible to treat the
byte type as unsigned by using the NUG convention of indicating the
unsigned range using the valid_min, valid_max, or valid_range attributes.

NetCDF does not support a character string type, so these must be 
represented as character arrays. ...
[Remainder of second paragraph on strings, not to be changed]

This ticket replaces that text with:

2.2.  Data Types

The netCDF data types char, byte, unsigned byte, short, unsigned short,
int, unsigned int, int64, unsigned int64, float or real, and double are all
acceptable.  The char type is not intended for numeric data.  One byte
numeric data should be stored using the byte or unsigned byte data types.

2.2.1  Unsigned Integers

[Original wording, unchanged]  It is possible to treat
the byte type as unsigned by using the NUG convention of
indicating the unsigned range using the valid_min,
valid_max, or valid_range attributes.

2.2.2  Character Strings

NetCDF Classic format does not support a character string type, so these 
must be represented as character arrays. ...
[Remainder of former paragraph on strings, unchanged]

Change History (12)

comment:1 follow-up: Changed 4 years ago by biard

I think it looks great!

I would prefer to see Section 2.2.1 above add the sentence

This usage is deprecated except when the version of netCDF being used does not support unsigned types.

I would also prefer for Section 2.2.1 to include the 'short' type for the NUG convention, as there is no reason to limit it to 'byte' type, but there is long history there. I am aware of multiple datasets in the wild that use this convention for storing unsigned shorts in netCDF-3. If this feels like too much, then worry about it later/elsewhere.

comment:2 follow-up: Changed 4 years ago by Dave.Allured

Charlie,

Thanks for starting this ticket, and narrowing the topic. I fully support this version.

I was a bit unfair in asking to break off the part about interchangeable integer types. I must admit this is very closely related to the core proposal.

Your Sept. 21 reply seems to focus only on file examples in the CF document, and how int32 examples could also represent other integer types. Can you show me an example in CF _rules_, rather than file examples, where type interchangeability needs to be clarified? This might help my confusion. Thanks.

comment:3 follow-up: Changed 4 years ago by jonathan

Dear Charlie

Thanks for this proposal. I would be happy with it as it stands, because it is strictly backwards compatible. However I think Jim's suggestions about deprecation and shorts are good. I assume you'll open another ticket about the signedness convention when this one is concluded.

Also, I note that the section about strings, which you don't propose to alter, begins "NetCDF does not support a character string type", which is no longer true. I understand that in this ticket you don't want to change that part of the convention, but maybe we could take the opportunity to correct that statement by changing it to "NetCDF Classic format does not ...". I expect that at some point soon we will want to allow the string type in NetCDF-4 files, but that's certainly not in the scope of this ticket.

Best wishes

Jonathan

comment:4 in reply to: ↑ 2 Changed 4 years ago by zender

I was surprised to see comments in this ticket since I had not been notified by email that there were any comments. It seems I have to manually set the CC: field and then it should happen automatically. I'll try that now. Anyway, I'm not used to this system so please bear with me.

Can you show me an example in CF _rules_, rather than file examples, where type interchangeability needs to be clarified?

Dave, I am only aware of instances in the examples, not the rules themselves.

comment:5 in reply to: ↑ 1 Changed 4 years ago by zender

Jim, I agree with both your suggestions. And if I understand Jonathan's comment #3 correctly, he suggests adding your sentence to this ticket rather than deferring it to a subsequent ticket. So once I figure out whether to make the change in the original ticket or post it as a comment, I'll do that. As to your second point about adding "short" to 2.2.1, I also agree with that, since it would merely codify an existing practice, and any subsequent ticket that clarifies signedness must also clarify it for shorts.

comment:6 in reply to: ↑ 3 Changed 4 years ago by zender

Jonathan, Great. I will modify the text above, or in the comments below, to include Jim's suggestions. I'll also insert the word "Classic" as you suggest. I will leave it to others (Dave?) to formulate ticket(s) to clarify implementation of signedness in classic files, and integer interchangability, if they are interested.

comment:7 Changed 4 years ago by zender

  • Description modified (diff)

comment:8 Changed 4 years ago by zender

  • Cc zender@… added

comment:9 Changed 4 years ago by biard

Charlie, you were going to add short integers to the "unsigned integers" section?

comment:10 follow-up: Changed 4 years ago by zender

  • Description modified (diff)

Jim, good catch, thanks for proofreading. I just added two words that "grandfather" shorts into the unsigned integers section.

comment:11 in reply to: ↑ 10 Changed 3 years ago by Dave.Allured

Replying to zender:

The use of valid_range to indicate unsigned integers is off topic and too complicated to discuss under the current topic "add new integer types". Please discuss this in new ticket #167. On this ticket #166, please revert this paragraph to the original CF 1.7 wording. Thank you.

2.2.1  Unsigned Integers

[Original wording, unchanged]  It is possible to treat
the byte type as unsigned by using the NUG convention of
indicating the unsigned range using the valid_min,
valid_max, or valid_range attributes.

comment:12 Changed 3 years ago by zender

  • Description modified (diff)

Done. To maintain consensus on the primary purpose of this ticket, adding new integer types, I think we must comply with Dave's suggestion to move all discussion of, and any changes to, treatment of unsigned types in CDF1 and CDF2 to new ticket #167.

Note: See TracTickets for help on using tickets.