Archived

This forum has been archived. Please start a new discussion on GitHub.

An error occurs when c# client send chinese string to c++ server

Hi:

An error occurs when c# client send chinese string to the c++ server.
The server can receive the string,but the result of unmarshal is wrong,
and an assert appears when release the memory of string.so I browse
the source code of ice and icecs,I see these:

basicstream.cpp:
void
IceInternal::BasicStream::write(const string& v)
{
Int len = static_cast<Int>(v.size());
writeSize(len);
if(len > 0)
{
Container::size_type pos = b.size();
resize(pos + len);
memcpy(&b[pos], v.c_str(), len);
}
}

void
IceInternal::BasicStream::read(string& v)
{
Int len;
readSize(len);
if(b.end() - i < len)
{
throw UnmarshalOutOfBoundsException(__FILE__, __LINE__);
}
if(len > 0)
{
v.assign(reinterpret_cast<const char*>(&(*i)), len);
i += len;
}
else
{
v.clear();
}
}

basicstream.cs:
public virtual void writeString(string v)
{
if(v == null || v.Length == 0)
{
writeSize(0);
return;
}
try
{
byte[] arr = utf8.GetBytes(v);
writeSize(arr.Length);
expand(arr.Length);
_buf.put(arr);
}
catch(Exception)
{
Debug.Assert(false);
}
}

public virtual string readString()
{
int len = readSize();

if(len == 0)
{
return "";
}

try
{
if(_stringBytes == null || len > _stringBytes.Length)
{
_stringBytes = new byte[len];
}
_buf.get(_stringBytes, 0, len);
return utf8.GetString(_stringBytes, 0, len);
}
catch(InvalidOperationException ex)
{
throw new Ice.UnmarshalOutOfBoundsException(ex);
}
catch(System.ArgumentException ex)
{
throw new Ice.MarshalException("Invalid UTF8 string", ex);
}
catch(Exception)
{
Debug.Assert(false);
return "";
}
}

So it seems the implementation of cs support UTF8,and C++ not support UTF8 but MBCS ??
When I use System.Text.Encoding.Default intead of UTF8Encoding in cs client,the server
can receive and print the string correctly.


my platform:
the Ice/IceCS version is 1.5.1, win 2000/vs.net 2003(7.1.3091)/.net 1.1(1.1.4322)

slice:
interface IPrinter
{
void Print(string s);
};

The Server:

class PrinterImpl : public IPrinter
{
public:
PrinterImpl(void);
virtual ~PrinterImpl(void);

virtual void Print(const ::std::string&, const ::Ice::Current& current);
};
PrinterImpl::PrinterImpl(void)
{
}

PrinterImpl::~PrinterImpl(void)
{
}

void PrinterImpl::Print(const ::std::string& s, const ::Ice::Current& current)
{
cout<<s<<endl;
}

Comments

  • marc
    marc Florida
    C++ also supports UTF-8, it just doesn't check whether every std::string being sent or received is really in UTF-8 format. It is the application's responsibility to make sure that UTF-8 strings are used with Ice. If your application does not use Unicode, you must convert your strings from whatever codeset you use to Unicode first.

    Ice for C++ has two utility functions that allow you to convert between a UTF-8 std::string and a UTF-16 std::wstring. Have a look at IceUtil/Unicode.h for details.
  • Of couse,string in C# is always in Unicode format,but std::string Ice for C++ use is in ANSI format. UTF8 and ANSI is in the same format only char from 0x00 to 0x7f.MSDN say VC++ 7.x only support MBCS and Unicode.

    e.g.
    platform Intel x86

    string a = "A"
    string w = "‰ä" //unicode format: 0x62 0x11

    Their bytes in C#:
    ansi unicode(C# compiler use) utf8
    a [0x41 [0x41 0x00] [0x41]
    w [0xce 0xd2] [0x11 0x62] [0xe6 0x88 0x91]

    in c++
    ansi unicode
    a [0x41] [0x41 0x00]
    w [0xce 0xd2] [0x11 0x62]

    When the C# client send string w,it really send "0xe6 0x88 0x91" because the class Ice.BasicStream of Ice for C# use UTF-8 Encoding(Class System.Text.UTF8Encoding).Then the C++ server receive bytes "0xe6 0x88 0x91" and basicstream of Ice for C++ can't correctly read string from the bytes.It works right if client and server are in the same language.Incorrect if not.

    e.g.
    send Client Server Result
    w C# C# OK
    w C++ C++ OK
    w C# C++ server error
    w C++ C# server error
  • I think you may have misunderstood Marc. The string that is coming from the C# side is delivered to the C++ as a UTF-8 encoded string. You need to explicitly convert it to a wide string in your C++ application code using the functions defined in Unicode.h.

    Cheers,

    Michi.
  • marc
    marc Florida
    I don't understand what you are exactly asking, i.e., what you expect Ice to do. BasicStream in Ice for C++ doesn't touch the strings it receives, so it doesn't convert anything, and thus cannot have a problem reading any string.

    Ice for C++ simply expects that the strings it receives are UTF-8 encoded, and this is what it will get from Ice for C#. If you have an UTF-8 compatible application (like an UTF-8 compatible GUI toolkit), then you can use these strings right away. Otherwise, you can also convert it to UTF-16 wstrings with the mentioned conversion functions.

    If your appllication doesn't use Unicode in either UTF-8 or UTF-16 encoding, then your own application code must convert between Unicode and whatever font you are using.