On Wed, Feb 22, 2017 at 12:08 PM, Bob Simons - NOAA Federal <
bob.simons at noaa.gov> wrote:
> I do like ISO-8859-1, because
> * It is compatible with ASCII for chars 0-127, which is all that ASCII
> specifies.
> * Any variable that has just 7bit ASCII chars can be labelled
> "charset=ISO-8859-1".
> * It is the most commonly used single-page 8bit charset for supporting the
> European languages.
> * It is widely used and supported.
>
all good. And I don't know if this is only the Python implementation, but
at least in Python, 8859-1 can read ANY binary data, and it round-trips
through a "proper" unicode object to get teh saem bytes back.
i.e. if the data are not 8859-1 or are malformed for some reason, the
8859-1 decoder will not error out on any input, and if you re-encode it,
you'll get back the same bytes you started with. Really nice property.
I do like UTF-8 because it is the only charset that supports full Unicode
> (all UTF-16/UCS-4/UTF-32 characters) in an 8bit encoding (since that is all
> we have for characters in netcdf-3 files: 8bit chars).
>
Again, I think this is a non-issue -- UTF-32 uses 4 bytes per char, i.e. 4
chars per codepoint. no reason you couldn't put UTF-32 encoded data in a
char array (C programmer do it all the time :-) )
> And it is incredibly widely used and supported in software.
All the rest of your reasons are good -- UTF-8 is the best choice.
So my proposal is: charset can specify any single-page (8bit) character
> set, but the two recommended charsets would be "ISO-8859-1" (for most
> simple cases) and "UTF-8" (for harder cases / full Unicode).
>
sounds good. though part of me wants to say that "ISO-8859-1" and "UTF-8"
should be the only options!
(darn those legacy files!)
Also -- I don't think yu can call UTF-8 an 8bit character set.
I'd also like the work "encoding" to be used instead of character set
wherever possible. "charset" comes from, and still implies, a 1-byte per
character system.
But that that's really a nitpick.
-CHB
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170227/1b1d757f/attachment.html>
Received on Mon Feb 27 2017 - 18:07:32 GMT