Oracle Character Set Conversion (10): 6.3. How to convert CLOB type to Korean
Let's look at how to convert CLOB type into Korean. You can check how to convert CLOB type containing Korean into AL32UTF8 in US7ASCII character set DB and the result of conversion.
This is a continuation of the previous article.
Oracle Character Set Conversion (9): 6. How to Convert User Implemented Character Sets (2)
6.3. How to convert CLOB type to Korean
Oracle Character Set Conversion(8): 6. How to Convert User Implemented Character Set (1)
In the above article, I mentioned the following:
For reference, in the case of US7ASCII to AL32UTF8 conversion, CLOB type columns are not converted normally, so separate processing is required.
It will be explained in the next article.
This article describes how to convert a CLOB type containing Korean into AL32UTF8 using DB Link from US7ASCII to AL32UTF8. Oracle default package DBMS_LOB is used.
First, let's check the CLOB data in the three instances ORAUS7, ORAMSWIN949, and ORAUTF.
6.3.1. Check CLOB data
Execute the following INSERT statement in the three instances ORAUS7, ORAMSWIN949, and ORAUTF to fill the CLOB column (REF_DES) with data.
INSERT INTO SUB_MON_STAT (USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT, COMMT, REF_DES, REF_IMG) VALUES ('201301', '1호선', '0150', '서울역', 2199181, 1855268, '20130723', RPAD('서울역', 3000, '서울역'), RPAD('서울역', 3000, '서울역'), UTL_RAW.CAST_TO_RAW('12345')) ;
The data type of each column is as follows.
Now, let's check the ref_des column (CLOB type) data in each instance with the following SQL.
select dbms_lob.getlength(ref_des) chr_len ,lengthb(dbms_lob.substr(ref_des, dbms_lob.getlength(ref_des), 1)) byte_len ,dump(dbms_lob.substr(ref_des, dbms_lob.getlength(ref_des), 1), 1016) dp ,ref_des from sub_mon_stat where ref_des is not null and rownum <= 1 ;
▼ ORAUS7
The character set of CLOB data is US7ASCII, and Korean is not displayed normally.
The number of characters (chr_len) is 3,000, and the number of bytes (byte_len) is also 3,000. That is, it can be seen that one character is stored as one byte.
I couldn't find a way to set NLS_LANG in SQL Developer, so I checked that Hangul was displayed by running it in sqlplus.
If NLS_LANG is set to US7ASCII as shown below, Korean is displayed.
C:\Users\ymlee>set NLS_LANG=AMERICAN_AMERICA.US7ASCII C:\Users\ymlee>sqlplus leg/leg@oraus7 SQL*Plus: Release 11.2.0.1.0 Production on Sun Jul 24 21:42:43 2022 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> set pagesize 100 SQL> set linesize 200 SQL> select dbms_lob.getlength(ref_des) chr_len 2 ,lengthb(dbms_lob.substr(ref_des, dbms_lob.getlength(ref_des), 1)) byte_len 3 ,substr(dump(dbms_lob.substr(ref_des, dbms_lob.getlength(ref_des), 1), 1016), 1, 200) dp 4 ,ref_des 5 from sub_mon_stat 6 where ref_des is not null 7 and rownum <= 1 8 SQL> / CHR_LEN BYTE_LEN ---------- ---------- DP -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- REF_DES -------------------------------------------------------------------------------- 3000 3000 Typ=1 Len=3000 CharacterSet=US7ASCII: bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa, 서울역서울역서울역서울역서울역서울역서울역서울역서울역서울역서울역서울역서울역서 SQL>
▼ ORAMSWIN949
Character set of CLOB data is KO16MSWIN949, and Korean is displayed normally.
The number of characters (chr_len) is 1,500, and the number of bytes (byte_len) is 3,000. That is, it can be seen that one character is stored as 2 bytes.
▼ ORAUTF
Character set of CLOB data is AL32UTF8, and Korean is displayed normally.
The number of characters (chr_len) is 1,334, and the number of bytes (byte_len) is 4,000. That is, it can be seen that one character is stored as 3 bytes.
(The exact number of bytes is 4,002, but there is a slight difference because the maximum number of bytes returned by dbms_lob.substr is 4,000)
6.3.2. CLOB data conversion
CLOB data cannot execute select through DB Link. I am getting "ORA-22992: cannot use selected LOB location from remote table" error.
select use_mon, line_num, sub_sta_id, sub_sta_nm, ride_pasgr_num, alight_pasgr_num, work_dt, commt ,ref_des from sub_mon_stat@dl_us7 a ; ORA-22992: 원격 테이블로 부터 선택된 LOB 위치를 사용할 수 없습니다 22992. 00000 - "cannot use LOB locators selected from remote tables" *Cause: A remote LOB column cannot be referenced. *Action: Remove references to LOBs in remote tables.
You can copy data using the Create Table As Select (CTAS) syntax as follows:
create table sub_mon_stat_from_oraus7 as select use_mon, line_num, sub_sta_id, sub_sta_nm, ride_pasgr_num, alight_pasgr_num, work_dt ,ref_des from sub_mon_stat@dl_us7 a ; select dbms_lob.getlength(ref_des) chr_len ,lengthb(dbms_lob.substr(ref_des, 1333, 1)) byte_len ,dump(dbms_lob.substr(ref_des, 1333, 1), 1016) dp ,ref_des from sub_mon_stat_from_oraus7 where ref_des is not null and rownum <= 1 ;
As you can see from the above query execution result, the character set of the CLOB column was converted to AL32UTF8, but the Korean character string was not converted normally.
The process of converting Hangul of a CLOB column is as follows.
- Create a view that converts the data type of the CLOB column to BLOB in the source DB (Instance #1, ORAUS7)
- Create a table in the target DB (Instance #3, ORAUTF) with CTAS (Create Table As Select) statement
- Create table by converting BLOB data type of table created above to CLOB type
Let's take a look one by one.
1. Create a view that converts the data type of CLOB column to BLOB in the source DB (Instance #1, ORAUS7)
If the data type of the CLOB column is converted to BLOB, the character set is not converted when querying with DB Link.
Create a function that converts CLOB to BLOB using the DBMS_LOB.CONVERTTOBLOB procedure as follows.
CREATE OR REPLACE FUNCTION FN_CLOB_TO_BLOB (P_CLOB CLOB) RETURN BLOB AS L_BLOB BLOB; L_DEST_OFFSET INTEGER := 1; L_SOURCE_OFFSET INTEGER := 1; L_LANG_CONTEXT INTEGER := DBMS_LOB.DEFAULT_LANG_CTX; L_WARNING INTEGER := DBMS_LOB.WARN_INCONVERTIBLE_CHAR; BEGIN IF DBMS_LOB.GETLENGTH(P_CLOB) != 0 THEN DBMS_LOB.CREATETEMPORARY(L_BLOB, TRUE); DBMS_LOB.CONVERTTOBLOB ( DEST_LOB =>L_BLOB, SRC_CLOB =>P_CLOB, AMOUNT =>DBMS_LOB.LOBMAXSIZE, DEST_OFFSET =>L_DEST_OFFSET, SRC_OFFSET =>L_SOURCE_OFFSET, BLOB_CSID =>DBMS_LOB.DEFAULT_CSID, LANG_CONTEXT=>L_LANG_CONTEXT, WARNING =>L_WARNING ); END IF; RETURN L_BLOB; END;
Then, create a View using the above function.
For testing purposes, only rows with values in some columns and CLOB column REF_DES were selected.
CREATE OR REPLACE VIEW VW_SUB_MON_STAT_BLOB AS SELECT USE_MON, SUB_STA_ID ,FN_CLOB_TO_BLOB(REF_DES) REF_DES_BLOB FROM SUB_MON_STAT WHERE REF_DES IS NOT NULL;
2. Create a table in the target DB (Instance #3, ORAUTF) using CTAS (Create Table As Select) syntax
Create a table by importing data from DB Link with the following syntax.
CREATE TABLE SUB_MON_STAT_BLOB AS SELECT A.* FROM VW_SUB_MON_STAT_BLOB@DL_US7 A;
3. Create table by converting BLOB data type of table created above to CLOB type
Create a function to convert BLOB to CLOB using DBMS_LOB.CONVERTTOCLOB procedure as follows.
CREATE OR REPLACE FUNCTION FN_BLOB_TO_CLOB (P_BLOB BLOB) RETURN CLOB AS L_CLOB CLOB; L_DEST_OFFSET INTEGER := 1; L_SOURCE_OFFSET INTEGER := 1; L_LANG_CONTEXT INTEGER := DBMS_LOB.DEFAULT_LANG_CTX; L_WARNING INTEGER := DBMS_LOB.WARN_INCONVERTIBLE_CHAR; BEGIN DBMS_LOB.CREATETEMPORARY(L_CLOB, TRUE); DBMS_LOB.CONVERTTOCLOB ( DEST_LOB =>L_CLOB, SRC_BLOB =>P_BLOB, AMOUNT =>DBMS_LOB.LOBMAXSIZE, DEST_OFFSET =>L_DEST_OFFSET, SRC_OFFSET =>L_SOURCE_OFFSET, --BLOB_CSID =>DBMS_LOB.DEFAULT_CSID, BLOB_CSID =>846, -- NLS_CHARSET_ID('KO16MSWIN949') LANG_CONTEXT=>L_LANG_CONTEXT, WARNING =>L_WARNING ); RETURN L_CLOB; END;
Among the parameters of the DBMS_LOB.CONVERTTOCLOB procedure, BLOB_CSID means “The character set ID of the source data”.
Please refer to the document below for more details.
DBMS_LOB – CONVERTTOCLOB Procedure (oracle.com)
BLOB_CSID must be specified as an integer type, not a string, and the value can be known with the NLS_CHARSET_ID function.
Since the character set of the original data is KO16MSWIN949, 846 was confirmed by executing the following.
SELECT NLS_CHARSET_ID('KO16MSWIN949') FROM DUAL;
And, you can create or query a table using the above function.
CREATE TABLE SUB_MON_STAT_BLOB_TO_CLOB AS SELECT USE_MON, SUB_STA_ID ,FN_BLOB_TO_CLOB(REF_DES_BLOB) REF_DES FROM SUB_MON_STAT_BLOB A; select * from sub_mon_stat_blob_to_clob ; -- Table 생성하지 않을 경우는 다음과 같이 query 가능 select use_mon, sub_sta_id ,fn_blob_to_clob(ref_des_blob) ref_des from sub_mon_stat_blob ;
You can check that Hangul is converted normally as follows.
So far, we have looked at how to convert CLOB type containing Korean into AL32UTF8 in US7ASCII character set DB.
It looks a little complicated because it goes through several processes, but it is not difficult if you look at it step by step.
This article concludes the series on Oracle character set conversion.
I hope to be of some help to those in need.