Archived

This forum has been archived. Please start a new discussion on GitHub.

"Invalid UTF8 string" when transer chinese chars between cpp server and csharp client

hello.
from last year, I began to use Ice. My data source is a large amount of and distributed, in more than 10 cities of China. With ice, my cgis and statistics programs can easily run at one place.
One of my application is a database query agent. The data in dbs contain chinese. The server is writen by cpp, and the clients are cpp and python. At the beginning, cpp<->cpp, cpp<->python are ok. But when I wrote a csharp client( the logic is the same as cpp and python client) and run it, bad things happend:
Unhandled Exception: Ice.MarshalException: Invalid UTF8 string ---> System.ArgumentException: Arg_InvalidUTF8
Parameter name: bytes
in <0x00451> System.Text.UTF8Encoding:InternalGetCharCount (System.Byte[] bytes, Int32 index, Int32 count, UInt32 leftOverBits, UInt32 leftOverCount, Boolean throwOnInvalid, Boolean flush)
in <0x0001e> System.Text.UTF8Encoding:GetCharCount (System.Byte[] bytes, Int32 index, Int32 count)
in <0x0001d> System.Text.Encoding:GetChars (System.Byte[] bytes, Int32 index, Int32 count)
in <0x00019> System.Text.Encoding:GetString (System.Byte[] bytes, Int32 index, Int32 count)
in <0x00086> IceInternal.BasicStream:readString ()--- End of inner exception stack trace ---

in <0x0011e> IceInternal.BasicStream:readString ()
in <0x00065> OssBerg.RowHelper:read (IceInternal.BasicStream __is)
in <0x00099> OssBerg.DBResult:__read (IceInternal.BasicStream __is)
in <0x00136> OssBerg._DataOperDelM:Query (DBRequest rqst, OssBerg.DBResult result, Ice.Context __context)

If the returned data contain chinese chars, the exception upon appears. Chinese chars in mysql tables are stored in "GB2312".
The server is simply connect mysql server, query, get the result and send to client.

The slice:
#ifndef DATA_OPER
#define DATA_OPER

module OssBerg
{
struct DBRequest
{
string dbnode;
string database;
string sqlstmt;
};

dictionary<string, string> Row;
sequence<Row> ResultSet;

struct DBResult
{
int affected;
int insertid;
ResultSet results;
};

enum ErrorType
{
DBNodeNotExist,
MysqlInitFailed,
MysqlConnectFailed,
MysqlQueryFailed,
MysqlStoreFailed
};

exception DBError
{
ErrorType type;
int errnum;
string errstr;
};

interface DataOper
{
void Query(DBRequest rqst, out DBResult result) throws DBError;
};
};

#endif

some server code:
void
DataOperI::Query(const DBRequest& rqst, DBResult& result, const ::Ice::Current& current)
{
/*
some mysql opts
*/
MYSQL_RES* res = mysql_store_result(&mysql);
MYSQL_ROW row;

while ((row = mysql_fetch_row(res)) != NULL)
{
Row one_row;
unsigned long* lengths = mysql_fetch_lengths(res);
for(int i = 0; i < num_fields; i++)
{
if (fields.type == FIELD_TYPE_BLOB && lengths != 0)
{
// transform BLOB to string
hex2str(row, lengths, one_row[fields.name]);
}
else
{
one_row[fields.name] = row ? row : "" ;
}
}
result.results.push_back(one_row);
}

mysql_free_result(res);

some client code:
public static void printres(DBResult res)
{
Console.WriteLine("Affected Rows : "+ res.affected);
Console.WriteLine("Last Insert ID : "+ res.insertid);
foreach ( Row row in res.results )
{
foreach ( string name in row.Keys )
{
Console.WriteLine("{0}:{1}",name, row[name]);
}
Console.WriteLine();
}
}

If row[name] contains chinese, exception hanppend. But when I use python, everything goes fine.
Is there anything I can do to make my db agent server working good for both cpp/python and csharp client?

Comments

  • matthew
    matthew NL, Canada
    Welcome to the forum. However, before we can offer you assistance please fill out your signature as detailed in http://www.zeroc.com/vbulletin/showthread.php?p=7297#post7297 this post.
  • I'm sorry. I forgot it :(. Thanks for your remind.
    Now, I've edited my signature. :)
  • matthew
    matthew NL, Canada
    The problem is that you are attempting to send a Big5 encoded string as a UTF-8 string -- and its not, leading to the C# runtime complaining :) So therefore you have one of two options:

    1/ Convert the BIG5 encoded data to a UTF-8 string
    2/ Ignore this issue and send the data as a raw sequence of data.

    It would be better to convert the Big5 data to Unicode since this is the correct approach with Ice. See System.Text.Encoding for more information on how to do this with C#.
  • Thank you for your advise, matthew. :)
    My problem is that the cpp server got some chinese chars from mysql, and sent them directly to csharp client. Before I can process the chinese chars in client, the ice runtime throw exception. should i convert the chinese string into unicode at the server side? But, if I do so, is there any problems for cpp client or python client to communicate with server.
    to sum up:
    1、should I convert gb2312/big5 encoded data to unicode at server side?
    2、if I convert gb2312/big5 encoded data to unicode at cpp server side,I must modify other client that already running to make them process unicode data, isn't it?
    hoping for your reply :)
  • matthew
    matthew NL, Canada
    Before you transfer a string over the wire with Ice you must first convert the string to UTF-8 (Unicode). If the string is not UTF-8 you will have problems -- such as the ones you have run into. So before your client that reads the mysql data sends the string to an Ice server it must first be converted.
  • heihei, your are right, dear matthew: everything sent to C# must be utf-8.
    The code:
    #include <iconv.h>
    
    int  code_convert(
            char * from_charset,
            char * to_charset,
            char * inbuf,
            int inlen,
            char * outbuf,
            int outlen
            )
    {
            iconv_t cd;
            int     rc;
            char ** pin = &inbuf;
            char ** pout = &outbuf;
    
            if ((cd = iconv_open(to_charset,from_charset)) == 0)
                return  -1;
    
            memset(outbuf,0,outlen);
    
            if (iconv(cd,pin,(size_t*)&inlen,pout,(size_t*)&outlen)==-1)
                return -1;
    
            iconv_close(cd);
    
            return 0;
    }
    
    char *  utf8(char * s)
    {
        static  char    buffer[1024];
        code_convert("gb2312","utf-8", s, strlen(s), buffer, 1024);
        return buffer;
    }
    

    I use iconv to convert gb2312 to utf-8. Hope this will be useful for other people :).
    Thanks again, matthew!
  • matthew
    matthew NL, Canada
    Glad to be of assistance. BTW, I wouldn't use a static buffer in the utf8 function -- its not thread safe in that way. You are also restricting the total buffer size to 1024, which perhaps is not so good also.
  • o, thank you for your advice, matthew, I'll think about a better way :)
    For C#, I must compile a special server, or modify all of my clients, sigh. Maybe I should tell Bill to fix his stupid csharp to fit for Ice, heihei.