Integrating Worlds

February 3, 2003 by Matthias Mittelstein, Michael Redford

Computers process and save each character, for example a “B”, as a byte sequence. The characters and byte strings are defined in a code page. The diversity of languages and writing systems means that numerous code pages are required. What’s more, the same character is mapped in different ways depending on the operating system or make of printer. Consequently, there are currently 390 code pages defined in the SAP system in order to support 41 different languages. Multiple, largely redundant code pages make data exchange more difficult and prevent integration.
Technologies based on Unicode do not have this problem. Unicode maps the characters of all the world’s scripts in a standardized form, thereby ensuring smooth communication between different languages and systems. Some 98,000 characters have already been defined in Unicode and space for more than one million characters is available. Using Unicode offers numerous advantages:

  • Many different languages can be supported more easily than ever before
  • There is always a risk of data being lost if Unicode is not used
  • Tight integration with other Unicode-based technologies such as XML, VB, JAVA and .NET is supported.

Since a Unicode system uses only a single code page, the restrictions relating to systems with more than one code page (Multi-Display, Multi-Processing, MDMP) naturally do not apply. Every user can log on in English, for example, and enter any Unicode characters he or she chooses.

Unique Identification, Multiple Encodings

Each character in the Unicode standard has a unique identification number (Unicode Scalar Value). There are several ways in which a number can be represented – “ten” is normally written as “10”. A computer, however, depicts it in the binary form “1010”. In the same way, there are three Unicode Transformation Formats (UTF) which can encode a Unicode Scalar Value in 8-, 16-, or 32-bit units.
The possibility of encoding each character in both 8-bit and 16-bit units brings enormous advantages. 16-bit encoding is better for system-internal character processing, while 8-bit encoding is ideal for exchanging data, since it keeps the quantity of data smaller. Regardless of the type of encoding, all Unicode characters are defined in all Unicode Transformation Formats. The conversion between Unicode Transformation Formats is mathematical – similar to converting Base 10 to Base 2. This ensures the conversion is both efficient and unambiguous.

SAP Web Application Server Supports Unicode

The Unicode changeover for mySAP.com began with SAP Web Application Server (SAP Web AS) 6.10. Several mySAP solutions now support Unicode. These include SAP R/3 Enterprise, mySAP Business Intelligence, mySAP Supply Chain Management and several scenarios of mySAP Customer Relationship Management. Other solutions will support Unicode in future releases; more information can be found in SAP Note 79991.

Unicode system

Unicode system

A Unicode system uses Unicode on every system level – i.e. database, application server and GUI. All leading database manufacturers now offer Unicode databases. Since support for Unicode is already integrated in SAP GUI 6.20, SAP users only require a single SAP GUI to work with Unicode and non-Unicode systems simultaneously. All characters are always displayed correctly in a Unicode system (see screenshot).
In a Unicode system, the ICU library (International Components for Unicode) defines, amongst other things, the character characteristics and language-dependent sort sequence. ICU replaces operating system dependent locales and supports this functionality irrespective of the computer language (C, ABAP and Java).

Converting Existing Systems

Each Unicode-enabled mySAP component can be installed as either a Unicode or non-Unicode system. For a new installation, no special steps are required and therefore a Unicode system is recommended for all new installations.
Before an existing system can be converted, however, several preparations need to be carried out. Non-Unicode systems can be converted as of SAP Web AS Release 6.20. Earlier releases therefore require an upgrade, for example from SAP R/3 4.6C to SAP R/3 Enterprise. It is then necessary

  • to convert the database with SAPINST
  • to check the customer’s own ABAP programs and
  • to check the C/C++ programs for RFC (Remote Function Call).

SAPINST exports the database, creates a new Unicode database, and imports into the newly created database. Conversion to Unicode takes place during the export process. For large databases, SAP is working on an incremental conversion in order to minimize downtime during the export.
When converting systems that use more than one code page (MDMP systems), further additional steps are required. To ensure that data is converted correctly, all language keys are analyzed during an MDMP conversion. The conversion process must also function correctly if tables without any language information contain language-dependent data. In this case, heuristics, a system vocabulary and the use of business data, e.g. accounting groups or customer numbers ensure the correct conversion. These preparations can be made in a SAP R/3 system 4.6C.
In C programs, Unicode and non-Unicode systems use different data types; however, the same program code can be used. SAP customers can use the “ccQ” tool to adapt RFC programs. Further information on ccQ can be found in the SAP RFC Software Development Kit. The steps required to Unicode enable ABAP programs is dealt with in a separate SAP INFO article.

Cross-System Integration Without Language Barriers

A Unicode system can be integrated effortlessly into a system landscape. What’s more, the fact that Unicode contains all characters used in old code pages means that a Unicode system is particularly well suited for collecting data from old SAP systems or non-SAP applications. For RFC communication, the Unicode system converts data into the recipient’s code page(s). Consequently, only the RFC connections in the Unicode system need to be specially maintained.
It is possible in principle to convert between code pages at any time and, thus, to exchange data between systems or programs. To ensure that the conversion process functions as it should, however, the sender and recipient need to use code pages that define the same character set. Both must also support their partner’s code page(s) otherwise data will be lost. As a result, Unicode is being used increasingly, in particular when data is exchanged within and outside SAP systems. For this reason, SAP Exchange Infrastructure (SAP XI) is also based on Unicode, since this enables process-controlled collaboration. Since ABAP and Java both use Unicode, tight integration is also possible when data is exchanged deeper in the system. If Unicode is not used, however, every type of integration will be put at risk by code page problems.
Further technical information, guidelines and overviews on the above issues can be found in the SAPNet under the alias Unicode@sap in the Media Library. If you are interested in a Unicode system, please contact globalization@sap.com.

Michael Redford

Michael Redford

Matthias Mittelstein

Matthias Mittelstein

Tags: , ,

Leave a Reply