ABAP Character Set – Study Notes
Overview
Application Server ABAP only supports Unicode systems in the current release. Non-Unicode systems are no longer supported.
Unicode vs Non-Unicode Systems
Aspect Unicode System Non-Unicode System
Definition ABAP based on Unicode character representation with a code page for Unicode AS ABAP with single-byte and double-byte code pages
Status ✅ Supported ❌ Not supported (obsolete)
ABAP Language Version Standard ABAP (Unicode) Non-Unicode ABAP
Unicode Standard
What is Unicode?
Unicode (ISO/IEC 10646) with character set UCS (Universal Character Set)
Covers all existing characters in the world
Unicode Character Formats
Format Bytes per Character Description
UTF 1-4 bytes Variable length encoding
UCS-2 2 bytes Fixed length (used by ABAP)
UTF-16 2-4 bytes System code page of Unicode systems
ABAP Character Representation
ABAP supports UCS-2 character representation
In ABAP, a character is assumed to have a fixed length of two bytes
This matches UTF-16 representation for most characters
⚠️ Restriction: Cannot handle characters from the surrogate area in UTF-16
Historical Character Sets (Pre-Unicode)
Before Unicode, SAP used various character encoding systems:
1. ASCII (American Standard Code for Information Interchange)
1 byte per character (7 bits standard, 8 bits extended)
Maximum 256 characters
Examples:
ISO-8859-1: Western European
ISO-8859-5: Cyrillic
2. EBCDIC (Extended Binary Coded Decimal Interchange)
1 byte per character
Maximum 256 characters
Example: EBCDIC 0697/0500 for Western European on IBM System i (AS/400)
3. Double-byte Code Pages
1-2 bytes per character
Maximum 65,536 characters (10,000-15,000 typically used)
Examples:
SJIS: Japanese
BIG5: Traditional Chinese
Problems with Multiple Character Sets
Different languages required different code pages
Mixing texts from incompatible character sets caused issues
Data exchange between systems with different encodings was problematic
Non-Unicode System Configuration
Item Description
System Code Pages Defined in database table TCPDB
Single Code Page Systems One system code page
MDMP Systems Multiple system code pages (obsolete)
Unicode Migration Considerations
When converting from non-Unicode to Unicode:
Areas Requiring Changes
Character string processing and byte string processing
Structure access (flat structures in Non-Unicode ABAP are treated as character-like data)
Key Points
Old assumption: One character = One byte
Unicode reality: One character = Two bytes (in ABAP)
Tools for Migration
Tool Purpose
UCCHECK Transaction to enable Unicode checks
RSUNISCAN_FINAL Program alternative to UCCHECK
TCPDB Database table for system code pages
Migration Steps
Configure ABAP language version to Standard ABAP (Unicode) or higher in program attributes
Enable Unicode checks (run in non-Unicode systems too)
Use transaction UCCHECK or program RSUNISCAN_FINAL to scan existing programs
Character Type in ABAP
In Unicode ABAP:
Type C (Character): Fixed 2 bytes per character
Type STRING: Variable length, character-based
Type XSTRING: Variable length, byte-based
Key Takeaways
AS ABAP only supports Unicode systems in current release
UCS-2 is used for character representation (2 bytes per character)
Non-Unicode systems are obsolete and no longer supported
UTF-16 is the system code page for Unicode systems
When migrating from non-Unicode: check for 1-byte = 1-character assumptions
Use UCCHECK or RSUNISCAN_FINAL to identify programs needing updates
Content Source: https://help.sap.com/
If you have problem, please fell free to contact.Thanks.
About me:
This post is come from www.hot583.com, you can share/use it with the post original link for free.
But pay attention of any risk yourself.
If you like, Fell free to let your friends know this. Thanks.
