Oracle Character Set Conversion (7): 5.4. KO16MSWIN949 Environment CSSCAN Execution Result
KO16MSWIN949 Environment Let's look at the execution result of CSSCAN. You can check schema creation, directory creation, csscan execution result files scan.txt, scan.out, scan.err, and csscan usage.
5.4. KO16MSWIN949 Environment CSSCAN Execution Result
5.4.1. create schema
Execute the script below in KO16MSWIN949 instance to create schema for csscan execution.
-- MSWIN949 sqlplus sys/________@oramswin949 as sysdba @%ORACLE_HOME%\rdbms\admin\csminst.sql sqlplus sys/________@oramswin949 as sysdba ALTER USER CSMIG DEFAULT TABLESPACE USERS; EXIT
- Line 3: Execute csminst.sql to create objects such as csmig user (schema) and tables required for csscan execution. At this time, the default tablespace for the csmig user is set to system.
- Line 5: Change the default tablespace of csmig user to users. You can change to another tablespace other than users with sufficient storage space.
5.4.2. create directory
Create a directory to save the csscan execution result and move to the directory.
mkdir D:\Temp\csscan\MSWIN949 cd D:\Temp\csscan\MSWIN949
5.4.3. run csscan
Execute csscan with the following command at the Windows command prompt.
csscan \"sys/________@oramswin949 as sysdba\" full=y tochar=AL32UTF8
Two parameters are specified.
- full: Specifies Y to scan the entire database.
- For reference, you can specify the scan target user as a user parameter or specify the scan target in detail with table and column parameters.
- tochar: Specify AL32UTF8, the target character set to be converted (CHAR, VARCHAR2, CLOB, LONG type scan)
- For reference, to scan NCHAR, NVARCHAR2, and NCLOB types, you need to specify the tonchar parameter.
5.4.4. Check csscan execution result-scan.txt (Result summary report)
The entire contents of scan.txt in the KO16MSWIN949 test environment are as follows.
Database Scan Summary Report Time Started : 2014-05-07 20:21:51 Time Completed: 2014-05-07 20:22:46 Process ID Time Started Time Completed ---------- -------------------- -------------------- 1 2014-05-07 20:22:04 2014-05-07 20:22:45 ---------- -------------------- -------------------- [Database Size] Tablespace Used Free Total Expansion ------------------------- --------------- --------------- --------------- --------------- SYSTEM 563.25M 136.75M 700.00M .00K SYSAUX 360.19M 239.81M 600.00M 2.00K UNDOTBS1 19.25M 465.75M 485.00M .00K TEMP .00K .00K .00K .00K USERS 1.81M 3.19M 5.00M 39.00K ------------------------- --------------- --------------- --------------- --------------- Total 944.50M 845.50M 1,790.00M 42.00K [Database Scan Parameters] Parameter Value ------------------------------ ------------------------------------------------ CSSCAN Version v2.1 Instance Name oramswin949 Database Version 11.2.0.1.0 Scan type Full database Scan CHAR data? YES Database character set KO16MSWIN949 FROMCHAR KO16MSWIN949 TOCHAR AL32UTF8 Scan NCHAR data? NO Array fetch buffer size 1024000 Number of processes 1 Capture convertible data? NO ------------------------------ ------------------------------------------------ [Scan Summary] All character type data in the data dictionary are convertible to the new character set Some character type application data are not convertible to the new character set [Data Dictionary Conversion Summary] Data Dictionary Tables: Datatype Changeless Convertible Truncation Lossy --------------------- ---------------- ---------------- ---------------- ---------------- VARCHAR2 2,566,115 36 0 0 CHAR 354 0 0 0 LONG 182,178 0 0 0 VARRAY 39,205 0 0 0 --------------------- ---------------- ---------------- ---------------- ---------------- Total 2,787,852 36 0 0 Total in percentage 99.999% 0.001% 0.000% 0.000% The data dictionary can not be safely migrated using the CSALTER script XML CSX Dictionary Tables: Datatype Changeless Convertible Truncation Lossy --------------------- ---------------- ---------------- ---------------- ---------------- VARCHAR2 493 0 0 0 CHAR 0 0 0 0 LONG 0 0 0 0 VARRAY 0 0 0 0 --------------------- ---------------- ---------------- ---------------- ---------------- Total 493 0 0 0 Total in percentage 100.000% 0.000% 0.000% 0.000% [Application Data Conversion Summary] Datatype Changeless Convertible Truncation Lossy --------------------- ---------------- ---------------- ---------------- ---------------- VARCHAR2 2,579,942 14,635 309 0 CHAR 0 0 0 0 LONG 0 0 0 0 VARRAY 0 0 0 0 --------------------- ---------------- ---------------- ---------------- ---------------- Total 2,579,942 14,635 309 0 Total in percentage 99.424% 0.564% 0.012% 0.000% [Distribution of Convertible, Truncated and Lossy Data by Table] Data Dictionary Tables: USER.TABLE Convertible Truncation Lossy -------------------------------------------------- ---------------- ---------------- ---------------- SYS.COM$ 11 0 0 SYS.ERROR$ 10 0 0 SYS.JOB$ 2 0 0 SYS.SCHEDULER$_JOB 8 0 0 SYS.SCHEDULER$_PROGRAM 5 0 0 -------------------------------------------------- ---------------- ---------------- ---------------- XML CSX Dictionary Tables: USER.TABLE Convertible Truncation Lossy -------------------------------------------------- ---------------- ---------------- ---------------- -------------------------------------------------- ---------------- ---------------- ---------------- Application Data: USER.TABLE Convertible Truncation Lossy -------------------------------------------------- ---------------- ---------------- ---------------- APEX_030200.WWV_FLOW_UPGRADE_PROGRESS 89 0 0 LEG.SUB_MON_STAT 14,544 309 0 LEG.Z_TEST 2 0 0 -------------------------------------------------- ---------------- ---------------- ---------------- [Distribution of Convertible, Truncated and Lossy Data by Column] Data Dictionary Tables: USER.TABLE|COLUMN Convertible Truncation Lossy -------------------------------------------------- ---------------- ---------------- ---------------- SYS.COM$|COMMENT$ 11 0 0 SYS.ERROR$|TEXT 10 0 0 SYS.JOB$|NLSENV 2 0 0 SYS.SCHEDULER$_JOB|NLS_ENV 8 0 0 SYS.SCHEDULER$_PROGRAM|NLS_ENV 5 0 0 -------------------------------------------------- ---------------- ---------------- ---------------- XML CSX Dictionary Tables: USER.TABLE|COLUMN Convertible Truncation Lossy -------------------------------------------------- ---------------- ---------------- ---------------- -------------------------------------------------- ---------------- ---------------- ---------------- Application Data: USER.TABLE|COLUMN Convertible Truncation Lossy -------------------------------------------------- ---------------- ---------------- ---------------- APEX_030200.WWV_FLOW_UPGRADE_PROGRESS|UPGRADE_ERR 89 0 0 LEG.SUB_MON_STAT|COMMT 0 1 0 LEG.SUB_MON_STAT|LINE_NUM 7,286 140 0 LEG.SUB_MON_STAT|SUB_STA_NM 7,258 168 0 LEG.Z_TEST|VAL 2 0 0 -------------------------------------------------- ---------------- ---------------- ---------------- [Indexes to be Rebuilt] USER.INDEX on USER.TABLE(COLUMN) ----------------------------------------------------------------------------------------- SYS.I_SCHEDULER_JOB4 on SYS.SCHEDULER$_JOB(SYS_NC00056$) SYS.I_SCHEDULER_JOB1 on SYS.SCHEDULER$_JOB(SYS_NC00055$) -----------------------------------------------------------------------------------------
Among them, “Application Data Conversion Summary” is not Oracle System Data, but scanned data by type in user-generated data.
[Application Data Conversion Summary] Datatype Changeless Convertible Truncation Lossy --------------------- ---------------- ---------------- ---------------- ---------------- VARCHAR2 2,579,942 14,635 309 0 CHAR 0 0 0 0 LONG 0 0 0 0 VARRAY 0 0 0 0 --------------------- ---------------- ---------------- ---------------- ---------------- Total 2,579,942 14,635 309 0 Total in percentage 99.424% 0.564% 0.012% 0.000%
The interpretation of the above is as follows.
- When changing to AL32UTF8 for VARCHAR2 type, 2,579,942 values can be converted as they are without change (Changeless), 14,635 values are convertable, 309 values are truncated, and lost/broken (Lossy) There is no value in which this occurs
- There is no conversion target for other CHAR, LONG, VARRAY type data.
Changeless, Convertible, Truncation, and Lossy in the above content were written in the previous article, and will be noted once again below.
Oracle Character Set Conversion (5) – 5.2. How to use CSSCAN
Status | Description |
changeless | Data remains the same in the new character set – No data change during character set conversion |
Convertible | Data can be successfully converted to the new character set – Data can be changed during character set conversion |
Truncation | Data will be truncated if conversion takes place – Data is truncated when converting character set |
Lossy | Character data will be lost if conversion takes place – Data loss (broken) when converting character set |
“Distribution of Convertible, Truncated and Lossy Data by Table” and “Distribution of Convertible, Truncated and Lossy Data by Column” are the scan results for each table and column.
Below are some excerpts from the file.
[Distribution of Convertible, Truncated and Lossy Data by Table] ... Application Data: USER.TABLE Convertible Truncation Lossy -------------------------------------------------- ---------------- ---------------- ---------------- APEX_030200.WWV_FLOW_UPGRADE_PROGRESS 89 0 0 LEG.SUB_MON_STAT 14,544 309 0 LEG.Z_TEST 2 0 0 -------------------------------------------------- ---------------- ---------------- ---------------- [Distribution of Convertible, Truncated and Lossy Data by Column] ... Application Data: USER.TABLE|COLUMN Convertible Truncation Lossy -------------------------------------------------- ---------------- ---------------- ---------------- APEX_030200.WWV_FLOW_UPGRADE_PROGRESS|UPGRADE_ERR 89 0 0 LEG.SUB_MON_STAT|COMMT 0 1 0 LEG.SUB_MON_STAT|LINE_NUM 7,286 140 0 LEG.SUB_MON_STAT|SUB_STA_NM 7,258 168 0 LEG.Z_TEST|VAL 2 0 0 -------------------------------------------------- ---------------- ---------------- ----------------
The interpretation of the above is as follows.
- When the LEG.SUB_MON_STAT table is changed to AL32UTF8, 14,544 values are normally converted (Convertible), 309 values are truncated, and there are no values that cause loss/loss. Each column is as follows
- COMMT: When changing to AL32UTF8, there is no value that is normally converted, one value is truncated, and there is no value that causes loss/loss.
- LINE_NUM: When changing to AL32UTF8, 7,286 values are normally converted (Convertible), 140 values are truncated, and there is no value that causes loss/loss.
- SUB_STA_NM: When changing to AL32UTF8, 7,258 values are normally converted (Convertible), 168 values are truncated, and there is no value that causes loss/loss.
- When the LEG.Z_TEST table is changed to AL32UTF8, two values are converted normally, and there is no value that is truncated or lost. The VAL column corresponds to this.
5.4.5. Check csscan execution result-scan.out (scan target table information)
In the scan.out file, you can check the list of scan target tables.
Character Set Scanner v2.2 : Release 11.2.0.1.0 - Production on Wed May 7 20:21:48 2014 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options Enter array fetch buffer size: 1024000 > Enter number of scan processes to utilize(1..64): 1 > Enumerating tables to scan... . process 1 scanning SYS.SOURCE$[AAAADfAABAAAAXYAAA] . process 1 scanning SYS.PARAMETER$[AAAAH3AABAAAAzgAAA] . process 1 scanning SYS.TYPE$[AAAAH3AABAAAAzgAAA] . process 1 scanning SYS.ATTRIBUTE$[AAAAH3AABAAAAzgAAA] ... . process 1 scanning MDSYS.SDO_COORD_REF_SYS[AAAPKtAACAAAEtYAAA] . process 1 scanning XDB.XDB$H_LINK[AAANh/AACAAACUwAAA] . process 1 scanning SYS.SYS$SERVICE_METRICS_TAB[AAADI9AACAAACEAAAA] ... . process 1 scanning LEG.SUB_MON_STAT[AAAQSRAAEAAAACIAAA] ... . process 1 scanning XDB.PRIN_PT[AAAN2VAACAAAEDoAAA] ORA-30967: operation directly on the Path Table is disallowed CSS-00144: failed to scan table XDB.PRIN_PT Creating Database Scan Summary Report... Creating Individual Exception Report... Scanner terminated successfully.
Here you can check the following:
- Line 23: Scan the SUB_MON_STAT table of the LEG user created in the test environment.
5.4.6. Check csscan execution result - scan.err (error details)
In the scan.err file, you can check the details of character set conversion errors for each table and column.
Database Scan Individual Exception Report [Database Scan Parameters] Parameter Value ------------------------------ ------------------------------------------------ CSSCAN Version v2.1 Instance Name oramswin949 Database Version 11.2.0.1.0 Scan type Full database Scan CHAR data? YES Database character set KO16MSWIN949 FROMCHAR KO16MSWIN949 TOCHAR AL32UTF8 Scan NCHAR data? NO Array fetch buffer size 1024000 Number of processes 1 Capture convertible data? NO ------------------------------ ------------------------------------------------ [Data Dictionary individual exceptions] User : SYS Table : COM$ Column: COMMENT$ Type : VARCHAR2(4000) Number of Exceptions : 0 Max Post Conversion Data Size: 381 ROWID Exception Type Size Cell Data(first 30 bytes) ------------------ ------------------ ----- ------------------------------ AAAACIAABAAARCPABN convertible 사용월 AAAACIAABAAARCPABO convertible 호선명 AAAACIAABAAARCPABP convertible 지하철역코드 AAAACIAABAAARCPABQ convertible 지하철역명 AAAACIAABAAARCPABR convertible 승차인원수 AAAACIAABAAARCPABS convertible 하차인원수 AAAACIAABAAARCPABT convertible 작업일자 AAAACIAABAAARCPABU convertible 비고 AAAACIAABAAARCPABV convertible 참조설명 AAAACIAABAAARCPABW convertible 참조이미지 AAAACIAABAAARCPABX convertible 지하철역별승하차인원 ------------------ ------------------ ----- ------------------------------ [Application data individual exceptions] User : LEG Table : SUB_MON_STAT Column: LINE_NUM Type : VARCHAR2(14) Number of Exceptions : 140 Max Post Conversion Data Size: 20 ROWID Exception Type Size Cell Data(first 30 bytes) ------------------ ------------------ ----- ------------------------------ AAAQSRAAEAAAAC/ABp exceed column size 20 공항철도 1호선 AAAQSRAAEAAAAC/ABq exceed column size 20 공항철도 1호선 AAAQSRAAEAAAAC/ABr exceed column size 20 공항철도 1호선 ... ------------------ ------------------ ----- ------------------------------ User : LEG Table : SUB_MON_STAT Column: COMMT Type : VARCHAR2(4000) Number of Exceptions : 1 Max Post Conversion Data Size: 4500 ROWID Exception Type Size Cell Data(first 30 bytes) ------------------ ------------------ ----- ------------------------------ AAAQSRAAEAAAACNAAA exceed column size 4500 서울역서울역서울역서울역서울역 ------------------ ------------------ ----- ------------------------------ User : LEG Table : SUB_MON_STAT Column: SUB_STA_NM Type : VARCHAR2(20) Number of Exceptions : 168 Max Post Conversion Data Size: 27 ROWID Exception Type Size Cell Data(first 30 bytes) ------------------ ------------------ ----- ------------------------------ AAAQSRAAEAAAAC+AA8 exceed column size 24 디지털미디어시티 AAAQSRAAEAAAAC+AAC exceed column size 21 가산디지털단지 AAAQSRAAEAAAAC/AAR exceed column size 27 동대문역사문화공원 ... ------------------ ------------------ ----- ------------------------------
You can check the CSSCAN execution result file in the KO16MSWIN949 environment in the github repository below.
- scan.txt: Oracle-CharacterSet-Conversion/scan.txt at main DAToolset/Oracle-CharacterSet-Conversion (github.com)
- scan.out: Oracle-CharacterSet-Conversion/scan.out at main DAToolset/Oracle-CharacterSet-Conversion (github.com)
- scan.err: Oracle-CharacterSet-Conversion/scan.err at main DAToolset/Oracle-CharacterSet-Conversion (github.com)
5.5. Note: CSSCAN use
CSSCAN can be used for two purposes.
- Check RowID and ColumnID of truncated or broken data during conversion -> Used for data purification
- In cases where most of the data is truncated or broken (eg US7ASCII to AL32UTF8), it is not meaningful because almost all data has to be cleaned manually, which is time-consuming and expensive.
- Check the list of columns to increase the length of the As-Is column -> Use it to design the To-Be data model
- It is very meaningful because it can be used to determine the number of digits of the standard domain when designing To-Be.
In addition, it is good to refer to this length (MAXSIZ) to create a replica table in which the character set is converted for As-Is data in advance. If the column length of the As-Is table is the same, the data whose length increased after character set conversion exceeds the column length of the As-Is table will fail to load with the following error.
– ORA-12899: value too large for column “OWNER”.”TABLE”.”COLUMN” (actual: Inserted length, maximum: Declared length)
(Example: ORA-12899: value too large for column “LEG”.”SUB_MON_STAT”.”LINE_NUM” (actual: 16, maximum: 14) )
The following is a Query statement to check the target that an error occurs during conversion among the table columns of a specific user after CSSCAN is executed, and whether it should be converted to the current length, conversion length, and CLOB.
SELECT U.USERNAME, TC.TABLE_NAME, TC.COLUMN_NAME, TC.DATA_TYPE, TC.DATA_LENGTH, CC.MAXSIZ ,CASE WHEN TC.DATA_TYPE <> 'CLOB' AND CC.MAXSIZ > 4000 THEN 'Y' END AS CLOB_CONV_YN FROM CSMIG.CSM$COLUMNS CC INNER JOIN DBA_USERS U ON (CC.USR# = U.USER_ID) INNER JOIN DBA_OBJECTS O ON (CC.OBJ# = O.OBJECT_ID) INNER JOIN DBA_TAB_COLUMNS TC ON (CC.COL# = TC.COLUMN_ID AND O.OBJECT_NAME = TC.TABLE_NAME) WHERE U.USERNAME = 'LEG' AND O.OBJECT_TYPE = 'TABLE' AND CC.ERRCNT <> 0;
A sample of the extracted data is as follows.
Row# | USERNAME | TABLE_NAME | COLUMN_NAME | DATA_TYPE | DATA_LENGTH (current length) | MAXSIZ (conversion length) | CLOB_CONV_YN (CLOB conversion) |
1 | LEG | SUB_MON_STAT | LINE_NUM | VARCHAR2 | 14 | 20 | |
2 | LEG | SUB_MON_STAT | SUB_STA_NM | VARCHAR2 | 20 | 27 | |
3 | LEG | SUB_MON_STAT | COMMT | VARCHAR2 | 4000 | 4500 | Y |
The meaning of each column is as follows.
column name | meaning |
USERNAME | owner of the table |
TABLE_NAME | target table name |
COLUMN_NAME | Column name of target table |
DATA_TYPE | Current data type of target column |
DATA_LENGTH | Current data length of target column |
MAXSIZ | Maximum expected length after character set conversion |
CLOB_CONV_YN | Whether the current VARCHAR2 type is subject to change to CLOB (Y: subject to review for change to CLOB) |
Note that, since CSSCAN was checked based on the data at the time when CSSCAN was executed, even if the length is increased based on this criterion, an “ORA-12899: value too large for column” error may occur depending on the increase in column length or data increase. The possibility is still there.
So far, we have looked at the results of executing CSSCAN in the KO16MSWIN949 test environment.