Oracle Character Set Conversion (3): 3. Client Environment Configuration (2)

Continuing from the previous article, we look at the configuration of the Oracle Character Set conversion client environment. Oracle Server Character Set and Client NLS_LANG recommended four configurations can be checked.

3.2. Summary of test case execution results

previous post Oracle Character Set Conversion (2): 3. Configuring the Client Environment (1)The execution results of each test case are summarized as follows.

Case #Server
Character Set
Client NLS_LANGvalueinputPrintDUMP value
1US7ASCIIKO16KSC5601'Hangul'successfractureTyp=1 Len=2 CharacterSet=US7ASCII: 3f,3f
'sharp'failure--
2US7ASCIIKO16MSWIN949'Hangul'successfractureTyp=1 Len=2 CharacterSet=US7ASCII: 3f,3f
'sharp'successfractureTyp=1 Len=1 CharacterSet=US7ASCII: 3f
3US7ASCIIUS7ASCII'Hangul'successnormalTyp=1 Len=4 CharacterSet=US7ASCII: c7,d1,b1,db
'sharp'successnormalTyp=1 Len=2 CharacterSet=US7ASCII: 98,de
4KO16MSWIN949KO16KSC5601'Hangul'successnormalTyp=1 Len=4 CharacterSet=KO16MSWIN949: c7,d1,b1,db
'sharp'failure--
5KO16MSWIN949KO16MSWIN949'Hangul'successnormalTyp=1 Len=4 CharacterSet=KO16MSWIN949: c7,d1,b1,db
'sharp'successnormalTyp=1 Len=2 CharacterSet=KO16MSWIN949: 98,de
6KO16MSWIN949US7ASCII'Hangul'successfractureTyp=1 Len=4 CharacterSet=KO16MSWIN949: 3f,3f,3f,3f
'sharp'successfractureTyp=1 Len=2 CharacterSet=KO16MSWIN949: 3f,3f
7AL32UTF8KO16KSC5601'Hangul'successnormalTyp=1 Len=6 CharacterSet=AL32UTF8: ed,95,9c,ea,b8,80
'sharp'failure--
8AL32UTF8KO16MSWIN949'Hangul'successnormalTyp=1 Len=6 CharacterSet=AL32UTF8: ed,95,9c,ea,b8,80
'sharp'successnormalTyp=1 Len=3 CharacterSet=AL32UTF8: ec,83,be
9-1AL32UTF8AL32UTF8
(cmd)
'Hangul'failure--
'sharp'failure--
9-2AL32UTF8AL32UTF8
(PowerShell)
'Hangul'successnormalTyp=1 Len=6 CharacterSet=AL32UTF8: ed,95,9c,ea,b8,80
'sharp'successnormalTyp=1 Len=3 CharacterSet=AL32UTF8: ec,83,be

3.3 Oracle Server Character Set and Client NLS_LANG Recommended Configuration

The configuration of the Oracle Server and Client environment for Hangul input/output is possible in the following four combinations.

Case #Server
Character Set
Client
NLS_LANG
Comment
3US7ASCIIUS7ASCIInever use it Use only if you cannot change the existing environment!!!
5KO16MSWIN949KO16MSWIN949Stores and inputs and outputs only Korean characters, English characters, numbers, special characters, Chinese characters, etc. supported on Korean Windows (multilingual characters cannot be stored in the server)
8AL32UTF8KO16MSWIN949Used when the server is a multilingual environment and the client inputs and outputs only Korean. Can be used when the client application cannot handle Unicode
9AL32UTF8AL32UTF8The value stored in the server is transmitted to the client as it is without conversion.
That is, the client must directly handle UTF8-encoded data.

Case #3 seems to have no problem with input/output, but looking at the dump value, you can see that the character set is US7ASCII. That is, it means that actual input/output is executed in units of 1 byte of US7ASCII, and it is stored incorrectly.

Case #3 실행결과
Case #3 Execution Result

When this data is transmitted to an external system through EAI, ETL, ESB, etc., Hangul characters are broken, and it is difficult to accurately exchange data. Therefore, Case #3 is a setting that should be absolutely avoided.

Case 9-1 is a phenomenon that occurs because Unicode input/output is not supported in cmd.exe, a Windows command prompt, although the original input/output settings do not have any problems.
If you check Case 9-2 in Windows PowerShell, you can see that the input/output is normal.

In order to store multilingual characters such as Chinese characters, Japanese characters, Thai characters, and Western European characters in addition to Korean, the following two combinations are possible.

Case #Server
Character Set
Client
NLS_LANG
Comment
8AL32UTF8Set according to each language characterClient NLS_LANG setting
– In case of Korean: KO16MSWIN949
– Chinese characters: ZHS16GBK, or ZHT16MSWIN950, or ZHT16HKSCS
– Japanese characters: JA16SJIS
– Thai characters: TH8TISASCII, etc.
See table below
9AL32UTF8AL32UTF8The value stored in the server is transmitted to the client as it is without conversion.
That is, the client must directly handle UTF8-encoded data.

The list of Client NLS_LANG values mentioned above <set according to each language character> is as follows.

Operating System LocaleNLS_LANG Value
Arabic (UAE)ARABIC_UNITED ARAB EMIRATES.AR8MSWIN1256
BulgarianBULGARIAN_BULKARIA.CL8MSWIN1251
CatalanCATALAN_CATALONIA.WE8MSWIN1252
Chinese (PRC)SIMPLIFIED CHINESE_CHINA.ZHS16GBK
Chinese (Taiwan)TRADITIONAL CHINESE_TAIWAN.ZHT16MSWIN950
Chinese (Hong Kong HKCS)TRADITIONAL CHINESE_HONG KONG.ZHT16HKSCS
Chinese (Hong Kong HKCS2001)TRADITIONAL CHINESE_HONG KONG.ZHT16HKSCS2001 (new in 10gR1)
CroatianCROATIAN_CROATIA.EE8MSWIN1250
CzechCZECH_CZECH REPUBLIC.EE8MSWIN1250
DanishDANISH_DENMARK.WE8MSWIN1252
Dutch (Netherlands)DUTCH_THE NETHERLANDS.WE8MSWIN1252
Dutch (Belgium)DUTCH_BELGIUM.WE8MSWIN1252
English (United Kingdom)ENGLISH_UNITED KINGDOM.WE8MSWIN1252
English (United States)AMERICAN_AMERICA.WE8MSWIN1252
EstonianESTONIAN_ESTONIA.BLT8MSWIN1257
FinnishFINNISH_FINLAND.WE8MSWIN1252
French (Canada)CANADIAN FRENCH_CANADA.WE8MSWIN1252
French (France)FRENCH_FRANCE.WE8MSWIN1252
German (Germany)GERMAN_GERMANY.WE8MSWIN1252
GreekGREEK_GREECE.EL8MSWIN1253
HebrewHEBREW_ISRAEL.IW8MSWIN1255
HungarianHUNGARIAN_HUNGARY.EE8MSWIN1250
IcelandicICELANDIC_ICELAND.WE8MSWIN1252
IndonesianINDONESIAN_INDONESIA.WE8MSWIN1252
Italian (Italy)ITALIAN_ITALY.WE8MSWIN1252
JapaneseJAPANESE_JAPAN.JA16SJIS
KoreanKOREAN_KOREA.KO16MSWIN949
LatvianLATVIAN_LATVIA.BLT8MSWIN1257
LithuanianLITHUANIAN_LITHUANIA.BLT8MSWIN1257
NorwegianNORWEGIAN_NORWAY.WE8MSWIN1252
PolishPOLISH_POLAND.EE8MSWIN1250
Portuguese (Brazil)BRAZILIAN PORTUGUESE_BRAZIL.WE8MSWIN1252
Portuguese (Portugal)PORTUGUESE_PORTUGAL.WE8MSWIN1252
RomanianROMANIAN_ROMANIA.EE8MSWIN1250
RussianRUSSIAN_CIS.CL8MSWIN1251
SlovakSLOVAK_SLOVAKIA.EE8MSWIN1250
Spanish (Spain)SPANISH_SPAIN.WE8MSWIN1252
SwedishSWEDISH_SWEDEN.WE8MSWIN1252
ThaiTHAI_THAILAND.TH8TISASCII
Spanish (Mexico)MEXICAN SPANISH_MEXICO.WE8MSWIN1252
Spanish (Venezuela)LATIN AMERICAN SPANISH_VENEZUELA.WE8MSWIN1252
TurkishTURKISH_TURKEY.TR8MSWIN1254
UkrainianUKRAINIAN_UKRAINE.CL8MSWIN1251
VietnameseVIETNAMESE_VIETNAM.VN8MSWIN1258

*source: NLS_LANG FAQ (oracle.com) of the document detail

A question may arise here.

The character set of the server is designated as AL32UTF8, a Unicode system, but why should the client NLS_LANG specify KO16MSWIN949, a 2 byte system?

This is because the Windows operating system is the basic method of encoding/decoding Hangul. (For reference, it is common to use KO16KSC5601 when inputting and outputting Korean characters on UNiX.) To mention it again, Server Character Set is a setting for “storing” character string data, and Client NLS_LANG “shows” character string data. , It is a setting for “Transmission”.

The server character set specifies AL32UTF8, a Unicode system, to “store” characters from various countries, and the basic encoding/decoding system supported by Windows for each language is used in the client (mainly Windows) environment of various countries that connects to this server. to specify Data entered in Non-Unicode is converted to Unicode and saved while being transmitted to the server through the Oracle client and SQL*Net.


When the server character set is AL32UTF8, if the client NLS_LANG is set to AL32UTF8, the Unicode value stored in the server is transmitted to the client without any conversion process. In other words, if the client NLS_LANG is to be set to AL32UTF8, the client must be able to encode/decode Unicode.

For reference, among the tools provided free of charge, ORACLE SQL Developer is a representative tool that supports Unicode well. DBeaver also supports Unicode well based on jdbc.


If you think that character set selection is difficult because the contents so far are complicated, you only need to remember the following contents.

  • Server Character Set set to AL32UTF8
  • Client NLS_LANG
    • If the client cannot handle Unicode or is limited to a specific language, set the value corresponding to the language
    • AL32UTF8 if the client can handle Unicode

So far, we have looked at the client environment configuration related to Oracle Character Set.

In the next article, we will look at how to convert the character set in an invalid environment where the server character set is US7ASCII and Hangul is stored.

Leave a Reply

Your email address will not be published. Required fields are marked *

en_USEnglish