Oracle Character Set Conversion(8): 6. How to Convert User Implemented Character Set (1)
Let's look at the user-implemented character set conversion method. You can check how to convert from KO16MSWIN949, US7ASCII to AL32UTF8 using Oracle Utility, Data Pump Export, Import.
Methods that require a separate implementation do not guarantee conversion in Oracle, but the user who implements them must guarantee the conversion.
This is a continuation of the previous article.
Oracle Character Set Conversion (7): 5.4. KO16MSWIN949 Environment CSSCAN Execution Result
6. How to Convert User Implemented Character Set
6.1. How to apply third party tools – ETL, etc.
A way to implement character set transformations using well-known ETL tools on the market (eg Informatica PowerCenter, IBM DataStage, etc.).
I do not have direct experience with this method, and I only know that it is possible, so I will not describe the detailed method.
However, please review only the considerations below.
- Each product incurs a purchase cost, or additional development labor costs are required even if it has already been purchased.
- Open Source TOS (Talend Open Studio)-DI, Pentaho Kettle, etc. can also be considered.
- In the case of writing data from the source DB to a file and reading and inserting the file into the target DB, the column separator and row separator are included in the data value, so errors are handled, such as being pushed one column at a time or one row being incorrectly stored as multiple rows is absolutely necessary.
- When selecting an initial tool, it is necessary to select an appropriate tool by conducting a pilot test on a large table. In addition, it is important to secure an experienced ETL development manpower.
6.2. How to use Oracle Utility
Explore three ways to implement character set conversion by leveraging Oracle features and utilities.
1. Utilize Data Pump Export/Import
- Force Character Set Change: ALTER DATABASE CHARACTER SET INTERNAL
- Executing Export/Import: Apply NLS_LANG same as Target Character Set
2. Utilize SQL*Loader
- Data unload from source DB: Formats that can be used in SQL*Loader, PL/SQL, Pro*C, Java, etc. need to be developed
- Load data into target DB with SQL*Loader
3. Using DB Link & CTAS & UTL_RAW
(See separate article: Oracle Character Set Conversion (9): 6. How to Convert User Implemented Character Sets (2))
- Create DB Link: Create a DB Link that points from Target to Source
- View creation: View created by converting CHAR, VARCHAR2 columns from Source to UTL_RAW.CAST_TO_RAW
- Load data into CTAS: Apply UTL_RAW.CAST_TO_VARCHAR2 and CONVERT function while executing CTAS (Create Table As Select) from Target to DB Link
The advantages and disadvantages of each method are summarized below.
Way | Advantages | disadvantage |
1. Data Pump Export/Import | – Parallel processing possible – Exception processing such as newline character, special character, null character is unnecessary | – Data unload time required – In case the source DB is US7ASCII . Character set change required (original preservation required) . Additional conversion of CLOB type data is required |
2. SQL*Loader | – Parallel processing possible | – Need to develop function to create (unload) data file – Exception/error handling required during file processing of unload/load (newline character, special character, null character, etc.) – Data unload time required |
3. DB Link & CTAS & UTL_RAW | – No data unload time required | – Parallel processing difficult with Parallel hint (need to process by directly dividing the data range) – Not applicable to data exceeding 2000 bytes (maximum length of RAW type is 2000 bytes) |
As you review these methods, consider the following considerations as well.
- When converting from US7ASCII to AL32UTF8 using the Data Pump Export/Import method, The character set of the original DB must be changed with DDL
- It is not recommended to directly change the source DB. If it goes wrong, it can be very difficult to undo.
- If the failover DB is already configured, it is desirable to use it or configure a separate replication DB to change it.
- Alternatively, it is possible to restore the backup data of the original DB to a separate server and use it.
- In addition, it is desirable to review various methods (disk replication, etc.) to configure the clone DB, and to select a method in consideration of cost and time.
- If it is difficult to configure a replication DB, it can be performed after full backup, but if recovery is required, service down-time may be long depending on the amount of data.
- Unless the service is stopped, no matter what method is applied, the extraction time from the original DB must be matched to ensure data consistency (using FLASHBACK_SCN, FLASHBACK_TIME)
- In addition, the method of reflecting the changed data after the point of extraction must be tested and prepared in advance (use of CDC, etc.)
6.2.1. Method 1) Data Pump Export/Import
Preparation: Use the following script to create directory objects and grant permissions required when executing Data Pump Export/Import.
Execute for each Oracle instance under test, ORAUS7, ORAMSWIN949, and ORAUTF.
▼ Grant Permissions to the ORAUS7 Instance
-- ORAUS7 instance에 권한 부여 title ORAUS7 set NLS_LANG=AMERICAN_AMERICA.US7ASCII sqlplus sys/________@oraus7 as sysdba CREATE OR REPLACE DIRECTORY PUMP_DIR AS 'D:\Temp\datapump'; GRANT READ, WRITE ON DIRECTORY PUMP_DIR TO LEG; EXIT
▼ Grant Privileges to ORAMSWIN949 Instance
-- ORAMSWIN949 instance에 권한 부여 title ORAMSWIN949 set NLS_LANG=AMERICAN_AMERICA.KO16MSWIN949 sqlplus sys/________@oramswin949 as sysdba CREATE OR REPLACE DIRECTORY PUMP_DIR AS 'D:\Temp\datapump'; GRANT READ, WRITE ON DIRECTORY PUMP_DIR TO LEG; EXIT
▼ Grant Privileges to ORAUTF Instance
-- ORAUTF instance에 권한 부여 title ORAUTF set NLS_LANG=AMERICAN_AMERICA.KO16MSWIN949 sqlplus sys/________@orautf as sysdba CREATE OR REPLACE DIRECTORY PUMP_DIR AS 'D:\Temp\datapump'; GRANT READ, WRITE ON DIRECTORY PUMP_DIR TO LEG; EXIT
Each of the following three cases will be considered.
- KO16MSWIN949 to AL32UTF8
- US7ASCII to AL32UTF8 (2 paths)
- US7ASCII to AL32UTF8 (1 path)
For reference, in the case of US7ASCII to AL32UTF8 conversion, the CLOB type column is not converted normally, so a separate process is required.
It will be explained in the next article.
1. KO16MSWIN949 to AL32UTF8
In this case, CHAR, VARCHAR2, and CLOB types are all converted normally.
The execution script for each step is as follows.
-------------------------------------------------------------------- -- 단계1 -- 실행 Instance: ORAMSWIN949 -- NLS_LANG: KO16MSWIN949 -- 작업명: Datapump Export -- Tool: expdp set NLS_LANG=AMERICAN_AMERICA.KO16MSWIN949 expdp leg/leg@oramswin949 directory=PUMP_DIR dumpfile=oramswin949_leg.dmp logfile=oramswin949_leg_exp.log schemas=leg -------------------------------------------------------------------- -- 단계2 -- 실행 Instance: ORAUTF -- NLS_LANG: KO16MSWIN949 -- 작업명: Table 생성 -- Tool: SQL*Plus -- Target table을 미리 생성 -- LINE_NUM, SUB_STA_NM 컬럼 길이 증가 -- COMMT 컬럼 데이터 타입 변경(VARCHAR2 -> CLOB) CREATE TABLE SUB_MON_STAT ( USE_MON VARCHAR2(8), LINE_NUM VARCHAR2(50), -- 14 -> 50 SUB_STA_ID VARCHAR2(4), SUB_STA_NM VARCHAR2(100), -- 20 -> 100 RIDE_PASGR_NUM NUMBER(10), ALIGHT_PASGR_NUM NUMBER(10), WORK_DT VARCHAR2(8), COMMT CLOB, -- VARCHAR2 -> CLOB REF_DES CLOB, REF_IMG BLOB ); -------------------------------------------------------------------- -- 단계3 -- 실행 Instance: ORAUTF -- NLS_LANG: KO16MSWIN949 -- 작업명: Datapump Import -- Tool: impdp set NLS_LANG=AMERICAN_AMERICA.KO16MSWIN949 impdp leg/leg@orautf directory=PUMP_DIR dumpfile= oramswin949_leg.dmp logfile=orautf_leg_imp.log content=data_only
2. US7ASCII to AL32UTF8 (2 paths)
Please refer to the following.
- Temporary replication DB server is a DB that replicates the As-Is operation DB in the same form as the original.
- In the test environment, ORAMSWIN949 instance was used as a temporary replication DB, and data is replicated with Data Pump. In this case, the character set must be changed to US7ASCII before replication.
- In case of actual application, it is desirable to separately configure and use the temporary replication DB server with US7ASCII Character Set.
This method is not recommended by ORACLE and requires separate conversion processing because CLOB type data is not normally converted.
Each step-by-step execution script is as follows.
-------------------------------------------------------------------- -- 단계1-1 -- 실행 Instance: ORAUS7 -- NLS_LANG: US7ASCII -- 작업명: Datapump Export -- Tool: expdp set NLS_LANG=US7ASCII expdp leg/leg@oraus7 directory=MIG_DIR dumpfile=oraus7_leg_exp.dmp logfile=oraus7_leg_exp.log schemas=leg -------------------------------------------------------------------- -- 단계1-2 -- 실행 Instance: ORAMSWIN949 -- NLS_LANG: US7ASCII -- 작업명: Datapump Import -- Tool: impdp set NLS_LANG=US7ASCII impdp leg/leg@oramswin949 directory= MIG_DIR dumpfile=oramswin949_leg_exp.dmp logfile=oramswin949_leg_imp.log -------------------------------------------------------------------- -- 단계2 -- 실행 Instance: ORAMSWIN949 -- NLS_LANG: US7ASCII -- 작업명: Character Set 변경 -- Tool: SQL*Plus SHUTDOWN IMMEDIATE; STARTUP MOUNT; ALTER SYSTEM ENABLE RESTRICTED SESSION; ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0; ALTER SYSTEM SET AQ_TM_PROCESSES=0; ALTER DATABASE OPEN; ALTER DATABASE CHARACTER SET INTERNAL_USE KO16MSWIN949; SHUTDOWN IMMEDIATE; STARTUP; -------------------------------------------------------------------- -- 단계3 -- 실행 Instance: ORAMSWIN949 -- NLS_LANG: KO16MSWIN949 -- 작업명: Datapump Export -- Tool: expdp -- sql*plus에서 한글 표시 되는지 확인 후 다음 실행 set NLS_LANG=KO16MSWIN949 expdp leg/leg@oramswin949 directory=MIG_DIR dumpfile=oramswin949_leg_exp.dmp logfile=oramswin949_leg_exp.log schemas=leg -------------------------------------------------------------------- -- 단계4 -- 실행 Instance: ORAUTF -- NLS_LANG: AL32UTF8 -- 작업명: Table 생성 -- Tool: SQL*Plus -- Target table을 미리 생성 -- LINE_NUM, SUB_STA_NM 컬럼 길이 증가 -- COMMT 컬럼 데이터 타입 변경(VARCHAR2 -> CLOB) CREATE TABLE SUB_MON_STAT ( USE_MON VARCHAR2(8), LINE_NUM VARCHAR2(50), -- 14 -> 50 SUB_STA_ID VARCHAR2(4), SUB_STA_NM VARCHAR2(100), -- 20 -> 100 RIDE_PASGR_NUM NUMBER(10), ALIGHT_PASGR_NUM NUMBER(10), WORK_DT VARCHAR2(8), COMMT CLOB, -- VARCHAR2 -> CLOB REF_DES CLOB, REF_IMG BLOB ); -------------------------------------------------------------------- -- 단계5 -- 실행 Instance: ORAUTF -- NLS_LANG: AL32UTF8 -- 작업명: Datapump Import -- Tool: impdp impdp leg/leg@orautf directory=MIG_DIR dumpfile=oramswin949_leg_exp.dmp logfile=oramswin949_leg_imp.log content=data_only
3. US7ASCII to AL32UTF8 (1 path)
This method is not recommended by ORACLE and requires separate conversion processing because CLOB type data is not normally converted.
In addition, since the character set of the operating DB is changed, the original data cannot be preserved and it is difficult to roll-back in the event of a mistake, so it should not be applied to the operating environment as much as possible.
If this method is unavoidably used, please verify it sufficiently in the test environment before applying it to the operating environment.
The execution script for each step is as follows.
-------------------------------------------------------------------- -- 단계1 -- 실행 Instance: ORAUS7 -- NLS_LANG: US7ASCII -- 작업명: Character Set 변경 -- Tool: SQL*Plus SHUTDOWN IMMEDIATE; STARTUP MOUNT; ALTER SYSTEM ENABLE RESTRICTED SESSION; ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0; ALTER SYSTEM SET AQ_TM_PROCESSES=0; ALTER DATABASE OPEN; ALTER DATABASE CHARACTER SET INTERNAL_USE KO16MSWIN949; SHUTDOWN IMMEDIATE; STARTUP; -------------------------------------------------------------------- -- 단계2 -- 실행 Instance: ORAMSWIN949(=ORAUS7) -- NLS_LANG: KO16MSWIN949 -- 작업명: Datapump Export -- Tool: expdp -- sql*plus에서 한글 표시 되는지 확인 후 다음 실행 set NLS_LANG=KO16MSWIN949 expdp leg/leg@oramswin949 directory=MIG_DIR dumpfile=oramswin949_leg_exp.dmp logfile=oramswin949_leg_exp.log schemas=leg -------------------------------------------------------------------- -- 단계3 -- 실행 Instance: ORAUTF -- NLS_LANG: AL32UTF8 -- 작업명: Table 생성 -- Tool: SQL*Plus -- Target table을 미리 생성 -- LINE_NUM, SUB_STA_NM 컬럼 길이 증가 -- COMMT 컬럼 데이터 타입 변경(VARCHAR2 -> CLOB) CREATE TABLE SUB_MON_STAT ( USE_MON VARCHAR2(8), LINE_NUM VARCHAR2(50), -- 14 -> 50 SUB_STA_ID VARCHAR2(4), SUB_STA_NM VARCHAR2(100), -- 20 -> 100 RIDE_PASGR_NUM NUMBER(10), ALIGHT_PASGR_NUM NUMBER(10), WORK_DT VARCHAR2(8), COMMT CLOB, -- VARCHAR2 -> CLOB REF_DES CLOB, REF_IMG BLOB ); -------------------------------------------------------------------- -- 단계4 -- 실행 Instance: ORAUTF -- NLS_LANG: AL32UTF8 -- 작업명: Datapump Import -- Tool: impdp impdp leg/leg@orautf directory=MIG_DIR dumpfile=oramswin949_leg_exp.dmp logfile=oramswin949_leg_imp.log content=data_only
6.2.2. Method 2) SQL*Loader
SQL*Loader needs a control file and a data file. Among them, the data file contains actual data.
To create this data file, you can create it yourself using various tools such as PL/SQL, Pro*C, and Java, or refer to the source code that has already implemented the same function and write it as it is or extend it slightly.
Here, we do not introduce how to make it ourselves, only the URL of the implemented source code.
Tom Kyte, author of Effective Oracle blog: https://asktom.oracle.com/Misc/httpasktomoraclecomtkyteflat.html
In addition, if you search on Google, you will find the source code that you can refer to.
After creating data files in units of tables or partitions, execute sqlldr in multiple groups at the same time and load them in parallel to minimize the execution time with an appropriate level of load.