Archived

This forum has been archived. Please start a new discussion on GitHub.

Unicode handling in IcePy

It seem like the unicode handling in IcePy is asymmetric, since Python unicode sent into the marshaling code will be transcoded to UTF-8 while strings received will be plain UTF-8, which is not all that useful in practice. Using Ice.Plugin.Converter - as suggested in http://www.zeroc.com/forums/comments/3680-python-unicode.html - won't do the trick, since changing the encoding won't affect the type (str vs. unicode).

The workaround we're using right now is to change Python's default encoding from ascii to UTF-8, but this is kind of a hack. There are several ways to accomplish this, we're using the following approach (this is no recommendation, but it's a workaround that does the trick and works ok in our specific environment):
import sys
reload(sys)
sys.setdefaultencoding('UTF-8')
del sys.setdefaultencoding

The alternative is to use the decode method a lot (like mystruct.stringMember.decode('UTF-8'), which is really error prone.

All of this is happening in Ice-3.4.2/py/modules/IcePy/Types.cpp, changing it shouldn't be too hard.

In the function "IcePy::PrimitiveInfo::marshal" when a string is encountered, writeString() is called, which detect and automatically convert a python unicode type to a UTF8 string, if Py_USING_UNICODE is defined.

However, in the function "IcePy::PrimitiveInfo::unmarshal" Ice strings are only returned as plain Python strings which are UTF-8 encoded, and there is no option to get a Python unicode value at all.

This behaviour is very asymmetric, and means that if you're working in a pure unicode environment, you have to manually convert every string you receive from Ice into a python unicode using .decode('utf-8').

What would be much better is to have a configuration option that would instruct IcePy to always convert Ice strings to Python unicode types.

Cheers
Michael

Comments

  • mes
    mes California
    Hi Michael,

    I agree, the current behavior could use some improvement. This is on our list of things to fix for the next release.

    Regards,
    Mark