Oracle Character Set Conversion (10): 6.3. How to convert CLOB type to Korean

Let's look at how to convert CLOB type into Korean. You can check how to convert CLOB type containing Korean into AL32UTF8 in US7ASCII character set DB and the result of conversion.

This is a continuation of the previous article.

Oracle Character Set Conversion (9): 6. How to Convert User Implemented Character Sets (2)

6.3. How to convert CLOB type to Korean

Oracle Character Set Conversion(8): 6. How to Convert User Implemented Character Set (1)

In the above article, I mentioned the following:

For reference, in the case of US7ASCII to AL32UTF8 conversion, CLOB type columns are not converted normally, so separate processing is required.
It will be explained in the next article.

This article describes how to convert a CLOB type containing Korean into AL32UTF8 using DB Link from US7ASCII to AL32UTF8. Oracle default package DBMS_LOB is used.

First, let's check the CLOB data in the three instances ORAUS7, ORAMSWIN949, and ORAUTF.

6.3.1. Check CLOB data

Execute the following INSERT statement in the three instances ORAUS7, ORAMSWIN949, and ORAUTF to fill the CLOB column (REF_DES) with data.

INSERT INTO SUB_MON_STAT
(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM,
 ALIGHT_PASGR_NUM, WORK_DT, COMMT,
 REF_DES, REF_IMG)
VALUES
('201301', '1호선', '0150', '서울역', 2199181,
 1855268, '20130723', RPAD('서울역', 3000, '서울역'),
 RPAD('서울역', 3000, '서울역'), UTL_RAW.CAST_TO_RAW('12345'))
;

The data type of each column is as follows.

Oracle Character Set 변환 테스트 테이블 구성
Oracle Character Set Conversion Test Table Configuration

Now, let's check the ref_des column (CLOB type) data in each instance with the following SQL.

select dbms_lob.getlength(ref_des) chr_len
      ,lengthb(dbms_lob.substr(ref_des, dbms_lob.getlength(ref_des), 1)) byte_len
      ,dump(dbms_lob.substr(ref_des, dbms_lob.getlength(ref_des), 1), 1016) dp
      ,ref_des
  from sub_mon_stat
 where ref_des is not null
   and rownum <= 1
;

▼ ORAUS7

The character set of CLOB data is US7ASCII, and Korean is not displayed normally.

ORAUS7 CLOB 데이터 확인
ORAUS7 CLOB data check

The number of characters (chr_len) is 3,000, and the number of bytes (byte_len) is also 3,000. That is, it can be seen that one character is stored as one byte.

I couldn't find a way to set NLS_LANG in SQL Developer, so I checked that Hangul was displayed by running it in sqlplus.

If NLS_LANG is set to US7ASCII as shown below, Korean is displayed.

ORAUS7 CLOB 한글 데이터 확인 (sqlplus, NLS_LANG=US7ASCII)
ORAUS7 CLOB Korean data check (sqlplus, NLS_LANG=US7ASCII)
C:\Users\ymlee>set NLS_LANG=AMERICAN_AMERICA.US7ASCII

C:\Users\ymlee>sqlplus leg/leg@oraus7

SQL*Plus: Release 11.2.0.1.0 Production on Sun Jul 24 21:42:43 2022

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> set pagesize 100
SQL> set linesize 200
SQL> select dbms_lob.getlength(ref_des) chr_len
  2        ,lengthb(dbms_lob.substr(ref_des, dbms_lob.getlength(ref_des), 1)) byte_len
  3        ,substr(dump(dbms_lob.substr(ref_des, dbms_lob.getlength(ref_des), 1), 1016), 1, 200) dp
  4        ,ref_des
  5    from sub_mon_stat
  6   where ref_des is not null
  7     and rownum <= 1
  8
SQL> /

   CHR_LEN   BYTE_LEN
---------- ----------
DP
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
REF_DES
--------------------------------------------------------------------------------
      3000       3000
Typ=1 Len=3000 CharacterSet=US7ASCII: bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,bc,ad,bf,ef,bf,aa,
서울역서울역서울역서울역서울역서울역서울역서울역서울역서울역서울역서울역서울역서


SQL>

▼ ORAMSWIN949

Character set of CLOB data is KO16MSWIN949, and Korean is displayed normally.

KO16MSWIN949 CLOB 데이터 확인
KO16MSWIN949 CLOB data check

The number of characters (chr_len) is 1,500, and the number of bytes (byte_len) is 3,000. That is, it can be seen that one character is stored as 2 bytes.

▼ ORAUTF

Character set of CLOB data is AL32UTF8, and Korean is displayed normally.

ORAUTF CLOB data check

The number of characters (chr_len) is 1,334, and the number of bytes (byte_len) is 4,000. That is, it can be seen that one character is stored as 3 bytes.

(The exact number of bytes is 4,002, but there is a slight difference because the maximum number of bytes returned by dbms_lob.substr is 4,000)

6.3.2. CLOB data conversion

CLOB data cannot execute select through DB Link. I am getting "ORA-22992: cannot use selected LOB location from remote table" error.

CLOB 데이터를 DB Link로 query할 때 ORA-22992 오류
ORA-22992 error when querying CLOB data with DB Link
select use_mon, line_num, sub_sta_id, sub_sta_nm, ride_pasgr_num, alight_pasgr_num, work_dt, commt
      ,ref_des
  from sub_mon_stat@dl_us7 a
;

ORA-22992: 원격 테이블로 부터 선택된 LOB 위치를 사용할 수 없습니다
22992. 00000 -  "cannot use LOB locators selected from remote tables"
*Cause:    A remote LOB column cannot be referenced.
*Action:   Remove references to LOBs in remote tables.

You can copy data using the Create Table As Select (CTAS) syntax as follows:

create table sub_mon_stat_from_oraus7
as
select use_mon, line_num, sub_sta_id, sub_sta_nm, ride_pasgr_num, alight_pasgr_num, work_dt
      ,ref_des
  from sub_mon_stat@dl_us7 a
;

select dbms_lob.getlength(ref_des) chr_len
      ,lengthb(dbms_lob.substr(ref_des, 1333, 1)) byte_len
      ,dump(dbms_lob.substr(ref_des, 1333, 1), 1016) dp
      ,ref_des
  from sub_mon_stat_from_oraus7
 where ref_des is not null
   and rownum <= 1
;

As you can see from the above query execution result, the character set of the CLOB column was converted to AL32UTF8, but the Korean character string was not converted normally.

CTAS로 생성한 Table 데이터 확인
Check table data created with CTAS

The process of converting Hangul of a CLOB column is as follows.

CLOB 컬럼의 한글을 변환하는 과정 요약
Summary of the process of converting Hangul of CLOB column
  1. Create a view that converts the data type of the CLOB column to BLOB in the source DB (Instance #1, ORAUS7)
  2. Create a table in the target DB (Instance #3, ORAUTF) with CTAS (Create Table As Select) statement
  3. Create table by converting BLOB data type of table created above to CLOB type

Let's take a look one by one.

1. Create a view that converts the data type of CLOB column to BLOB in the source DB (Instance #1, ORAUS7)

If the data type of the CLOB column is converted to BLOB, the character set is not converted when querying with DB Link.

Create a function that converts CLOB to BLOB using the DBMS_LOB.CONVERTTOBLOB procedure as follows.

CREATE OR REPLACE FUNCTION FN_CLOB_TO_BLOB (P_CLOB CLOB) RETURN BLOB
AS
 L_BLOB          BLOB;
 L_DEST_OFFSET   INTEGER := 1;
 L_SOURCE_OFFSET INTEGER := 1;
 L_LANG_CONTEXT  INTEGER := DBMS_LOB.DEFAULT_LANG_CTX;
 L_WARNING       INTEGER := DBMS_LOB.WARN_INCONVERTIBLE_CHAR;
BEGIN
  IF DBMS_LOB.GETLENGTH(P_CLOB) != 0 THEN
      DBMS_LOB.CREATETEMPORARY(L_BLOB, TRUE);
      DBMS_LOB.CONVERTTOBLOB
      (
       DEST_LOB    =>L_BLOB,
       SRC_CLOB    =>P_CLOB,
       AMOUNT      =>DBMS_LOB.LOBMAXSIZE,
       DEST_OFFSET =>L_DEST_OFFSET,
       SRC_OFFSET  =>L_SOURCE_OFFSET,
       BLOB_CSID   =>DBMS_LOB.DEFAULT_CSID,
       LANG_CONTEXT=>L_LANG_CONTEXT,
       WARNING     =>L_WARNING
      );
  END IF;
  RETURN L_BLOB;
END;

Then, create a View using the above function.

For testing purposes, only rows with values in some columns and CLOB column REF_DES were selected.

CREATE OR REPLACE VIEW VW_SUB_MON_STAT_BLOB
AS
SELECT  USE_MON, SUB_STA_ID
       ,FN_CLOB_TO_BLOB(REF_DES) REF_DES_BLOB
  FROM  SUB_MON_STAT
 WHERE  REF_DES IS NOT NULL;

2. Create a table in the target DB (Instance #3, ORAUTF) using CTAS (Create Table As Select) syntax

Create a table by importing data from DB Link with the following syntax.

CREATE  TABLE SUB_MON_STAT_BLOB
AS
SELECT  A.*
  FROM  VW_SUB_MON_STAT_BLOB@DL_US7 A;

3. Create table by converting BLOB data type of table created above to CLOB type

Create a function to convert BLOB to CLOB using DBMS_LOB.CONVERTTOCLOB procedure as follows.

CREATE OR REPLACE FUNCTION FN_BLOB_TO_CLOB (P_BLOB BLOB) RETURN CLOB
AS
 L_CLOB          CLOB;
 L_DEST_OFFSET   INTEGER := 1;
 L_SOURCE_OFFSET INTEGER := 1;
 L_LANG_CONTEXT  INTEGER := DBMS_LOB.DEFAULT_LANG_CTX;
 L_WARNING       INTEGER := DBMS_LOB.WARN_INCONVERTIBLE_CHAR;
BEGIN
  DBMS_LOB.CREATETEMPORARY(L_CLOB, TRUE);
  DBMS_LOB.CONVERTTOCLOB
  (
   DEST_LOB    =>L_CLOB,
   SRC_BLOB    =>P_BLOB,
   AMOUNT      =>DBMS_LOB.LOBMAXSIZE,
   DEST_OFFSET =>L_DEST_OFFSET,
   SRC_OFFSET  =>L_SOURCE_OFFSET,
   --BLOB_CSID   =>DBMS_LOB.DEFAULT_CSID,
   BLOB_CSID   =>846, -- NLS_CHARSET_ID('KO16MSWIN949')
   LANG_CONTEXT=>L_LANG_CONTEXT,
   WARNING     =>L_WARNING
  );
  RETURN L_CLOB;
END;

Among the parameters of the DBMS_LOB.CONVERTTOCLOB procedure, BLOB_CSID means “The character set ID of the source data”.

Please refer to the document below for more details.

DBMS_LOB – CONVERTTOCLOB Procedure (oracle.com)

BLOB_CSID must be specified as an integer type, not a string, and the value can be known with the NLS_CHARSET_ID function.

Since the character set of the original data is KO16MSWIN949, 846 was confirmed by executing the following.

NLS_CHARSET_ID 함수로 BLOB_CSID 확인
Check BLOB_CSID with NLS_CHARSET_ID function
SELECT NLS_CHARSET_ID('KO16MSWIN949')
  FROM DUAL;

And, you can create or query a table using the above function.

CREATE  TABLE SUB_MON_STAT_BLOB_TO_CLOB
AS
SELECT  USE_MON, SUB_STA_ID
       ,FN_BLOB_TO_CLOB(REF_DES_BLOB) REF_DES
  FROM  SUB_MON_STAT_BLOB A;

select *
  from sub_mon_stat_blob_to_clob
;

-- Table 생성하지 않을 경우는 다음과 같이 query 가능
select use_mon, sub_sta_id
       ,fn_blob_to_clob(ref_des_blob) ref_des
  from sub_mon_stat_blob
;

You can check that Hangul is converted normally as follows.

CLOB 한글 변환결과 확인
Check the result of CLOB Korean conversion

So far, we have looked at how to convert CLOB type containing Korean into AL32UTF8 in US7ASCII character set DB.

It looks a little complicated because it goes through several processes, but it is not difficult if you look at it step by step.

This article concludes the series on Oracle character set conversion.

I hope to be of some help to those in need.

Leave a Reply

Your email address will not be published. Required fields are marked *

en_USEnglish