Oracle Character Set Conversion (9): 6. How to Convert User Implemented Character Sets (2)

Previous article(Oracle Character Set Conversion(8): 6. How to Convert User Implemented Character Set (1)), the user-implemented Character Set conversion method is examined. You can check how to convert from KO16MSWIN949, US7ASCII to AL32UTF8 using DB Link, CTAS, and UTL_RAW packages.

This is a method of converting character set while importing data through DB Link.

Configure three DB instances on one server and run three tests.

DB 서버, DB Link 구성, 테스트 케이스
DB server, DB Link configuration, test case
  1. Get ORAUS7 data from ORAMSWIN949
  2. Get ORAMSWIN949 data from ORAUTF
  3. Get ORAUS7 data from ORAUTF

Hangul data stored in ORAUS7 can be checked as follows.

ORAUS7에 저장된 한글 데이터 확인
Check Hangul data stored in ORAUS7
C:\Users\ymlee>SET NLS_LANG=AMERICAN_AMERICA.US7ASCII

C:\Users\ymlee>sqlplus leg/leg@oraus7

SQL*Plus: Release 11.2.0.1.0 Production on Sun Jul 17 22:34:26 2022

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> set pagesize 100
SQL> set linesize 200
SQL> select use_mon, line_num, sub_sta_id, sub_sta_nm, ride_pasgr_num, alight_pasgr_num, work_dt
  2    from sub_mon_stat
  3   where rownum <= 10
  4  ;

USE_MON  LINE_NUM       SUB_ SUB_STA_NM           RIDE_PASGR_NUM ALIGHT_PASGR_NUM WORK_DT
-------- -------------- ---- -------------------- -------------- ---------------- --------
201301   2호선          0214 강변                        1565310          1549316 20130723
201301   2호선          0215 잠실나루                     536413           505376 20130723
201301   2호선          0216 잠실                        2323516          2106978 20130723
201301   2호선          0217 신천                         814104           786397 20130723
201301   2호선          0218 종합운동장                   374633           389860 20130723
201301   2호선          0219 삼성                        2015421          2050988 20130723
201301   2호선          0220 선릉                        1773445          1572999 20130723
201301   2호선          0221 역삼                        1496812          1644097 20130723
201301   2호선          0222 강남                        3453154          3558986 20130723
201301   2호선          0223 교대                        1195621          1323778 20130723

10 rows selected.

SQL>

The tests below use Oracle SQL Developer.

SQL Developer is a tool provided by Oracle and is free. You can download it from the URL below.

Oracle SQL Developer Downloads

Test #1) Get ORAUS7 data from ORAMSWIN949

You can import data by creating a DL_US7 DB Link and specifying the DB Link “dl_us7”. The script is as follows.

CREATE DATABASE LINK DL_US7
CONNECT TO leg IDENTIFIED BY leg
USING '(DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ORAUS7)
    )
  )'
;

select use_mon, line_num, sub_sta_id, sub_sta_nm, ride_pasgr_num, alight_pasgr_num, work_dt, commt
  from sub_mon_stat@dl_us7
;

Korean characters in the execution result are not displayed normally, but appear as ? (question mark) characters.

테스트 #1) ORAMSWIN949 에서 ORAUS7 데이터 가져오기 실행 결과
Test #1-1) Result of importing ORAUS7 data from ORAMSWIN949

When data is imported through DB Link, the character set of the server (ORAUS7 in this case) is converted to the character set of the client (ORAMSWIN949 in this case).

Hangul stored in US7ASCII is stored incorrectly, so it cannot be converted to MSWIN949, so it is displayed as a question mark.

To display ORAUS7 Korean data normally in this situation, apply the following method.

  • Create a view in ORAUS7 using the UTL_RAW.CAST_TO_RAW function to prevent VARCHAR2 data type columns containing Korean characters from being converted to character sets.
    • Apply the UTL_RAW.CAST_TO_RAW function to columns containing Korean characters.
    • RAW data type does not convert character set.
  • ORAMSWIN949 queries the View created above, converts the RAW data type back to VARCHAR2, and converts it to normal Hangul by specifying the source character set and destination character set with the CONVERT function.

Create a view that converts VARCHAR2 data type to RAW data type in the ORAUS7 instance with the script below.

-- ORAUS7에 View 생성
create or replace view vw_sub_mon_stat as
select use_mon
      --,line_num
      ,utl_raw.cast_to_raw(line_num) as line_num_raw
      ,sub_sta_id
      --,sub_sta_nm
      ,utl_raw.cast_to_raw(sub_sta_nm) as sub_sta_nm_raw
      ,ride_pasgr_num, alight_pasgr_num, work_dt, commt
  from sub_mon_stat
;
-- ORAMSWIN949: utl_raw.cast_to_varchar2, convert 함수 적용하여 한글 데이터 정상 표시
select use_mon
      ,convert(utl_raw.cast_to_varchar2(line_num_raw), 'KO16MSWIN949', 'KO16MSWIN949') line_num
      ,sub_sta_id
      ,convert(utl_raw.cast_to_varchar2(sub_sta_nm_raw), 'KO16MSWIN949', 'KO16MSWIN949') sub_sta_nm
      ,ride_pasgr_num, alight_pasgr_num, work_dt, commt
  from vw_sub_mon_stat@dl_us7 a
;

The execution result is normally displayed in Korean as follows.

Test #1-2) Result of importing ORAUS7 data through View in ORAMSWIN949

Test #2) Get ORAMSWIN949 data from ORAUTF

Characterset is normally converted from MSWIN949 to UTF8 on DB Link, and Korean is imported well.

-- ORAUTF에서 ORAMSWIN949 데이터 가져오기
select use_mon, line_num, sub_sta_id, sub_sta_nm, ride_pasgr_num, alight_pasgr_num, work_dt, commt
  from sub_mon_stat@dl_mswin949
;

The execution result is as follows.

테스트 #2) ORAUTF 에서 ORAMSWIN949 데이터 가져오기 실행 결과
Result of test #2) Get ORAMSWIN949 data from ORAUTF

Test #3) Get ORAUS7 data from ORAUTF

As in “Test #1) Importing ORAUS7 data from ORAMSWIN949”, Korean characters are displayed as ? (question mark).

select use_mon, line_num, sub_sta_id, sub_sta_nm, ride_pasgr_num, alight_pasgr_num, work_dt, commt
  from sub_mon_stat@dl_us7
;

The execution result is as follows.

Test #3-1) Result of importing ORAUS7 data from ORAUTF

In order to display Korean normally, the view created in the “Test #1) Importing ORAUS7 data from ORAMSWIN949” step is used as it is, and only the CONVERT function parameter of the SQL executed in ORAUTF through DB Link needs to be slightly changed.

-- ORAUTF: utl_raw.cast_to_varchar2, convert 함수 적용하여 한글 데이터 정상 표시
select use_mon
      ,convert(utl_raw.cast_to_varchar2(line_num_raw), 'AL32UTF8', 'KO16MSWIN949') line_num
      ,sub_sta_id
      ,convert(utl_raw.cast_to_varchar2(sub_sta_nm_raw), 'AL32UTF8', 'KO16MSWIN949') sub_sta_nm
      ,ride_pasgr_num, alight_pasgr_num, work_dt, commt
  from vw_sub_mon_stat@dl_us7 a
;

AL32UTF8 is specified because the second parameter of the CONVERT function means the destination character set.

The execution result is as follows.

테스트 #3-2) ORAUTF 에서 View를 통해 ORAUS7 데이터 가져오기 실행 결과
Test #3-2) Result of importing ORAUS7 data through View in ORAUTF

Reference

Explore RAW data types

Some excerpts from the oracle documentation about RAW data types are given below.

source: https://docs.oracle.com/cd/E11882_01/server.112/e41084/sql_elements001.htm#SQLRF50993

RAW and LONG RAW Data Types

The RAW and LONG RAW data types store data that is not to be explicitly converted by Oracle Database when moving data between different systems. These data types are intended for binary data or byte strings. For example, you can use LONG RAW to store graphics, sound, documents, or arrays of binary data, for which the interpretation is dependent on the use.

RAW is a variable-length data type like VARCHAR2, except that Oracle Net (which connects client software to a database or one database to another) and the Oracle import and export utilities do not perform character conversion when transmitting RAW or LONG RAW data In contrast, Oracle Net and the Oracle import and export utilities automatically convert CHAR, VARCHAR2, and LONG data between different database character sets, if data is transported between databases, or between the database character set and the client character set, if data is transported between a database and a client. The client character set is determined by the type of the client interface, such as OCI or JDBC, and the client configuration (for example, the NLS_LANG environment variable).

The meaning of the sentence with inverted color among the above contents is as follows.

  • These data types are intended for binary data or byte strings
    –> RAW data type is used to store binary data.
  • do not perform character conversion when transmitting RAW or LONG RAW data
    –> When transmitting RAW data type through DB Link (Oracle Net), character set is not converted.

Character set conversion function CONVERT

Some excerpts from the oracle documentation about the CONVERT function are given below.

source: https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions034.htm#SQLRF00620

Character set 변환 함수 CONVERT Syntax
Character set conversion function CONVERT Syntax

CONVERT converts a character string from one character set to another.

  – The char argument is the value to be converted. It can be any of the data types CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.

  – The dest_char_set argument is the name of the character set to which char is converted.

  – The source_char_set argument is the name of the character set in which char is stored in the database. The default value is the database character set.

The return value for CHAR and VARCHAR2 is VARCHAR2. For NCHAR and NVARCHAR2, it is NVARCHAR2. For CLOB, it is CLOB, and for NCLOB, it is NCLOB.

Precautions

RAW data type can be used up to 2000 bytes. It can be extended after Oracle 12c R2, but generally up to 2000 bytes. In US7ASCII, Hangul is stored in 2 bytes, so this method can be applied only up to 1000 Hangul characters.

(If English letters and numbers are included, the number of convertible characters may increase.)

To convert data exceeding 2000 bytes, it seems possible to divide the view into two or more columns and combine them after converting in the instance (eg ORAUTF) that reads the view, but this has not been tested.

Pay attention to the maximum number of bytes that can be converted.

Leave a Reply

Your email address will not be published. Required fields are marked *

en_USEnglish