General Administration

I18N - Installation Information for ECI Clients

 
  • ECI Clients

    • ECI clients connecting to the Java Client receive the data in ISO-8859-1 encoding, unless eci_set_enc is called.
    • ECI clients connecting to the Agile EDM Server will get UTF-8 data unless they call eci_set_enc for each connection to the server.
      UTF-8 is the standard character set of the C server, thus no character conversion is needed when transferring data from/to the server.

      If the client needs a different encoding, it has to call eci_set_enc:
      • The server will switch the data encoding to the desired one and will send all characters in this encoding.
      • The server expects that all incoming data is in this encoding.

    The encoding that any Java Client should use can be read from the configuration parameter EDB-CHR-ENC-JVM. However, the client may decide to use UTF-8 even if the server does not require it. UTF-8 is always a supported encoding.

    The Agile EDM Server side implementation of eci_set_enc supports all encodings that are supported by the underlying ICU library. The Java Client implementation of eci_set_enc only supports encodings known by the Java runtime.

  • Coding Guidelines for C/C++ modules on the Agile EDM Server

To use UTF-8 as internal character set has the following implications on existing C/C++ code that runs inside the Agile EDM Server process.

  • Memory Allocation
    The memory needed to store string data increases by factor 4. The following macros are used, which are defined in axalant.h:

    /* Define for i18N character conversion */
    #define EP_MBCS_SIZE 4 /* UTF-8 has 4 Bytes maximum length */
    #define EP_CS(s) (EP_MBCS_SIZE * (s))
    #define EP_NO_CS(s) (s)

    Example of a declaration of a C string that should store the content of a field with 40 characters in the database:

    char caFieldData[EP_CS(40)+1] = "";

    This will allocate 40*4+1 bytes to safely store the maximum field content returned by the EPQ database layer.
    The EP_NO_CS macro simply indicates that the array is not intended to store any UTF-8 data and was used to mark such code as safe.

  • String Manipulation
    The Agile EDM Server will return all string data in its APIs as UTF-8 - metadata and user data. Also, all string data passed to its APIs has to be in UTF-8.

    The C string and file functions are ASCII compatible and work with UTF-8. In the following situation it is necessary to
    replace the standard C functions with respective functions of the ICU library.

    • printf and scanf functions
      With patterns like "%10s".
      Looping through a string and searching for a specific non-ASCII character need to be adapted.

    To adapt code:

  1. Convert the C UTF-8 string to an ICU UChar string.
  2. Use the respective ICU function like u_strlen, u_strcpy, u_printf, etc.
  3. If required, convert back to UTF-8.
  • epshr library utility functions
    Functions for converting characters in the epshr_ucnv library (see epshr_ucnv.h)

    Function Description
    Ucnv_toUChars Converts a UTF-8 C string to a Unicode UChar string.
    Ucnv_fromUChars Converts a Unicode UChar string to a C UTF-8 string.
    Ucnv_utf8_strlen Returns the length in characters of a UTF-8 string (without the need to convert it to UChar first).
    Ucnv_convertToUTF8 Converts text from any encoding to UTF-8.
    Ucnv_detectCharset Detects the charset used by a specific string.
    Ucnv_isUTF8 Checks if the specified string is in UTF-8.
    Ucnv_getEnv Returns the content of a environment variable in UTF-8.

  • Replacements for the standard C file functions
    Functions to cope with UTF-8 file names (dir_ufile.h) and with UTF-8 file content (epshr_ufileh):

    Function Description
    File_getFileDes Opens a Standard I/O file with an UTF-8 file name and returns the low level file number.
    File_open Opens a Standard I/O file with an UTF-8 file name.
    File_reopen Reopens a Standard I/O file with an UTF-8 file name.
    File_access Checks access to a file with an UTF-8 file name.
    File_rename Renames a file with an UTF-8 file name.
    File_remove Deletes a file with an UTF-8 file name.
    File_unlink Deletes a file with an UTF-8 file name (same as File_remove).
    File_chmod Changes the access mode of a file.
    File_ignoreBOM Reads the first bytes of the file and ignores the Unicode Byte Order Mark.
    u_File_open Opens a Unicode I/O file with an UTF-8 file name.
  • String functions
    Functions to handle UTF-8 and UChar data (epshr_ustr.h):

    Function Description
    u_Str_dup Duplicates a UChar string.
    u_Str_format Formats an UChar string and returns the allocated result.
    u_Str_vformat Formats an UChar string with a variable argument list and returns the allocated result.
    u_Str_str_chr Complex string parsing with qoute support (see documentation for details).
    u_Str_isAscii(const char *cpUtf8String) Checks if an UTF-8 string contains only ASCII characters.
    u_Str_isDigit Checks if an UChar is an ASCII digit.
  • Java ECI Client
    When using the Java ECI, an ECI client can set the encoding in the parameter object - either EciConParam, or EciClientParam. The ECI connection implementation calls eci_set_enc automatically.
    The encoding can be changed by calling setEncoding at any time (see restrictions above).
    UTF-8 encoding and decoding is performed by the Java ECI.
    When using the JET layer (i.e. the AxalantRepository class), everything is done automatically. The encoding used by the JET layer is read from EDB-CHR-ENC-JVM. When calling AxalantRepository.callEci() for all direct calls to the server, no problems will occur with the encoding.
  • C/C++ ECI Client
    The C ECI does not have parameter objects for the connection, so the client has to call eci_set_enc after opening a connection with eci_connect.
    UTF-8 encoding and decoding must be done by the ECI API user. The ECI API does not do any character conversion at all.
    To support the C developer, the ICU libraries have been added to the third party libraries shipped in $ep_root/ext/bin. The header files are at $ep_root/ext/inc/icu.