Oracle Character Set Conversion (9): 6. How to Convert User Implemented Character Sets (2)
Previous article(Oracle Character Set Conversion(8): 6. How to Convert User Implemented Character Set (1)), the user-implemented Character Set conversion method is examined. You can check how to convert from KO16MSWIN949, US7ASCII to AL32UTF8 using DB Link, CTAS, and UTL_RAW packages.
6.2.3. Method 3) DB Link & CTAS & UTL_RAW
This is a method of converting character set while importing data through DB Link.
Configure three DB instances on one server and run three tests.
- Get ORAUS7 data from ORAMSWIN949
- Get ORAMSWIN949 data from ORAUTF
- Get ORAUS7 data from ORAUTF
Hangul data stored in ORAUS7 can be checked as follows.
C:\Users\ymlee>SET NLS_LANG=AMERICAN_AMERICA.US7ASCII C:\Users\ymlee>sqlplus leg/leg@oraus7 SQL*Plus: Release 11.2.0.1.0 Production on Sun Jul 17 22:34:26 2022 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> set pagesize 100 SQL> set linesize 200 SQL> select use_mon, line_num, sub_sta_id, sub_sta_nm, ride_pasgr_num, alight_pasgr_num, work_dt 2 from sub_mon_stat 3 where rownum <= 10 4 ; USE_MON LINE_NUM SUB_ SUB_STA_NM RIDE_PASGR_NUM ALIGHT_PASGR_NUM WORK_DT -------- -------------- ---- -------------------- -------------- ---------------- -------- 201301 2호선 0214 강변 1565310 1549316 20130723 201301 2호선 0215 잠실나루 536413 505376 20130723 201301 2호선 0216 잠실 2323516 2106978 20130723 201301 2호선 0217 신천 814104 786397 20130723 201301 2호선 0218 종합운동장 374633 389860 20130723 201301 2호선 0219 삼성 2015421 2050988 20130723 201301 2호선 0220 선릉 1773445 1572999 20130723 201301 2호선 0221 역삼 1496812 1644097 20130723 201301 2호선 0222 강남 3453154 3558986 20130723 201301 2호선 0223 교대 1195621 1323778 20130723 10 rows selected. SQL>
The tests below use Oracle SQL Developer.
SQL Developer is a tool provided by Oracle and is free. You can download it from the URL below.
Oracle SQL Developer Downloads
Test #1) Get ORAUS7 data from ORAMSWIN949
You can import data by creating a DL_US7 DB Link and specifying the DB Link “dl_us7”. The script is as follows.
CREATE DATABASE LINK DL_US7 CONNECT TO leg IDENTIFIED BY leg USING '(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = ORAUS7) ) )' ; select use_mon, line_num, sub_sta_id, sub_sta_nm, ride_pasgr_num, alight_pasgr_num, work_dt, commt from sub_mon_stat@dl_us7 ;
Korean characters in the execution result are not displayed normally, but appear as ? (question mark) characters.
When data is imported through DB Link, the character set of the server (ORAUS7 in this case) is converted to the character set of the client (ORAMSWIN949 in this case).
Hangul stored in US7ASCII is stored incorrectly, so it cannot be converted to MSWIN949, so it is displayed as a question mark.
To display ORAUS7 Korean data normally in this situation, apply the following method.
- Create a view in ORAUS7 using the UTL_RAW.CAST_TO_RAW function to prevent VARCHAR2 data type columns containing Korean characters from being converted to character sets.
- Apply the UTL_RAW.CAST_TO_RAW function to columns containing Korean characters.
- RAW data type does not convert character set.
- ORAMSWIN949 queries the View created above, converts the RAW data type back to VARCHAR2, and converts it to normal Hangul by specifying the source character set and destination character set with the CONVERT function.
Create a view that converts VARCHAR2 data type to RAW data type in the ORAUS7 instance with the script below.
-- ORAUS7에 View 생성 create or replace view vw_sub_mon_stat as select use_mon --,line_num ,utl_raw.cast_to_raw(line_num) as line_num_raw ,sub_sta_id --,sub_sta_nm ,utl_raw.cast_to_raw(sub_sta_nm) as sub_sta_nm_raw ,ride_pasgr_num, alight_pasgr_num, work_dt, commt from sub_mon_stat ;
-- ORAMSWIN949: utl_raw.cast_to_varchar2, convert 함수 적용하여 한글 데이터 정상 표시 select use_mon ,convert(utl_raw.cast_to_varchar2(line_num_raw), 'KO16MSWIN949', 'KO16MSWIN949') line_num ,sub_sta_id ,convert(utl_raw.cast_to_varchar2(sub_sta_nm_raw), 'KO16MSWIN949', 'KO16MSWIN949') sub_sta_nm ,ride_pasgr_num, alight_pasgr_num, work_dt, commt from vw_sub_mon_stat@dl_us7 a ;
The execution result is normally displayed in Korean as follows.
Test #2) Get ORAMSWIN949 data from ORAUTF
Characterset is normally converted from MSWIN949 to UTF8 on DB Link, and Korean is imported well.
-- ORAUTF에서 ORAMSWIN949 데이터 가져오기 select use_mon, line_num, sub_sta_id, sub_sta_nm, ride_pasgr_num, alight_pasgr_num, work_dt, commt from sub_mon_stat@dl_mswin949 ;
The execution result is as follows.
Test #3) Get ORAUS7 data from ORAUTF
As in “Test #1) Importing ORAUS7 data from ORAMSWIN949”, Korean characters are displayed as ? (question mark).
select use_mon, line_num, sub_sta_id, sub_sta_nm, ride_pasgr_num, alight_pasgr_num, work_dt, commt from sub_mon_stat@dl_us7 ;
The execution result is as follows.
In order to display Korean normally, the view created in the “Test #1) Importing ORAUS7 data from ORAMSWIN949” step is used as it is, and only the CONVERT function parameter of the SQL executed in ORAUTF through DB Link needs to be slightly changed.
-- ORAUTF: utl_raw.cast_to_varchar2, convert 함수 적용하여 한글 데이터 정상 표시 select use_mon ,convert(utl_raw.cast_to_varchar2(line_num_raw), 'AL32UTF8', 'KO16MSWIN949') line_num ,sub_sta_id ,convert(utl_raw.cast_to_varchar2(sub_sta_nm_raw), 'AL32UTF8', 'KO16MSWIN949') sub_sta_nm ,ride_pasgr_num, alight_pasgr_num, work_dt, commt from vw_sub_mon_stat@dl_us7 a ;
AL32UTF8 is specified because the second parameter of the CONVERT function means the destination character set.
The execution result is as follows.
Reference
Explore RAW data types
Some excerpts from the oracle documentation about RAW data types are given below.
source: https://docs.oracle.com/cd/E11882_01/server.112/e41084/sql_elements001.htm#SQLRF50993
RAW and LONG RAW Data Types
The RAW and LONG RAW data types store data that is not to be explicitly converted by Oracle Database when moving data between different systems. These data types are intended for binary data or byte strings. For example, you can use LONG RAW to store graphics, sound, documents, or arrays of binary data, for which the interpretation is dependent on the use.
…
RAW is a variable-length data type like VARCHAR2, except that Oracle Net (which connects client software to a database or one database to another) and the Oracle import and export utilities do not perform character conversion when transmitting RAW or LONG RAW data In contrast, Oracle Net and the Oracle import and export utilities automatically convert CHAR, VARCHAR2, and LONG data between different database character sets, if data is transported between databases, or between the database character set and the client character set, if data is transported between a database and a client. The client character set is determined by the type of the client interface, such as OCI or JDBC, and the client configuration (for example, the NLS_LANG environment variable).
The meaning of the sentence with inverted color among the above contents is as follows.
- These data types are intended for binary data or byte strings
–> RAW data type is used to store binary data. - do not perform character conversion when transmitting RAW or LONG RAW data
–> When transmitting RAW data type through DB Link (Oracle Net), character set is not converted.
Character set conversion function CONVERT
Some excerpts from the oracle documentation about the CONVERT function are given below.
source: https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions034.htm#SQLRF00620
CONVERT converts a character string from one character set to another.
– The char argument is the value to be converted. It can be any of the data types CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.
– The dest_char_set argument is the name of the character set to which char is converted.
– The source_char_set argument is the name of the character set in which char is stored in the database. The default value is the database character set.
The return value for CHAR and VARCHAR2 is VARCHAR2. For NCHAR and NVARCHAR2, it is NVARCHAR2. For CLOB, it is CLOB, and for NCLOB, it is NCLOB.
Precautions
RAW data type can be used up to 2000 bytes. It can be extended after Oracle 12c R2, but generally up to 2000 bytes. In US7ASCII, Hangul is stored in 2 bytes, so this method can be applied only up to 1000 Hangul characters.
(If English letters and numbers are included, the number of convertible characters may increase.)
To convert data exceeding 2000 bytes, it seems possible to divide the view into two or more columns and combine them after converting in the instance (eg ORAUTF) that reads the view, but this has not been tested.
Pay attention to the maximum number of bytes that can be converted.