Oracle Character Set Conversion (4): 4.Configuring the Test Environment
Continuing from the previous article, we look at the configuration of the Oracle Character Set conversion test environment. You can check Oracle server environment configuration (three instances, DB Link creation), test table creation, and test data creation scripts. This test environment is used in the character set conversion process later.
This is a continuation of the previous article.
Oracle Character Set Conversion (3): 3. Client Environment Configuration (2)
4. Configure Oracle Character Set Conversion Test Environment
4.1. Oracle DB Server Environment Configuration
It is assumed that the test environment for Oracle Character Set conversion is configured as follows on the Local PC (Windows).
Three Oracle DB instances were installed on one PC, and the configuration of each is as follows.
Environment | SID | Server Character Set | Oracle Version | Test User (Schema) |
US7ASCII Current DB | ORAUS7 | US7ASCII | 11.2.0.1.0 | LEG |
KO16MSWIN949 Current DB | ORAMSWIN949 | KO16MSWIN949 | 11.2.0.1.0 | LEG |
AL32UTF8 Target DB | ORAUTF | AL32UTF8 | 11.2.0.1.0 | LEG |
DB Link is configured as follows.
Generated SID | Target SID | Connection User (Schema) | purpose |
ORAMSWIN949 | ORAUS7 | LEG | US7ASCII access and conversion test on KO16MSWIN949 |
ORAUTF | ORAUS7 | LEG | AL32UTF8 to US7ASCII access and conversion test |
The DB Link creation script is as follows.
CREATE DATABASE LINK DL_US7 CONNECT TO leg IDENTIFIED BY leg USING '(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = ORAUS7) ) )' ;
4.2. test table configuration
The data model for testing is as follows.
(For testing purposes, all additional details such as normalization and identifier configuration are omitted. The same length is applied to US7ASCII and MSWIN949, and the length is increased to AL32UTF8.)
The data model above is a structure in which two items (reference description, reference image) for testing are added to the data set of <Information on the number of people getting on and off by subway line and station in Seoul> disclosed by the Seoul Metropolitan Government.
source: https://data.seoul.go.kr/dataList/OA-12252/S/1/datasetView.do
The DDL that creates the schema of the test data model above is as follows.
-- 한글 2 byte 체계 스키마 생성(US7ASCII, MSWIN949) CREATE TABLE SUB_MON_STAT ( USE_MON VARCHAR2(8), LINE_NUM VARCHAR2(14), -- AL32UTF8: 50 SUB_STA_ID VARCHAR2(4), SUB_STA_NM VARCHAR2(20), -- AL32UTF8: 100 RIDE_PASGR_NUM NUMBER(10), ALIGHT_PASGR_NUM NUMBER(10), WORK_DT VARCHAR2(8), COMMT VARCHAR2(4000), REF_DES CLOB, REF_IMG BLOB ); -- 한글 3 byte 체계 스키마 생성(AL32UTF8) CREATE TABLE SUB_MON_STAT ( USE_MON VARCHAR2(8), LINE_NUM VARCHAR2(50), SUB_STA_ID VARCHAR2(4), SUB_STA_NM VARCHAR2(100), RIDE_PASGR_NUM NUMBER(10), ALIGHT_PASGR_NUM NUMBER(10), WORK_DT VARCHAR2(8), COMMT VARCHAR2(4000), REF_DES CLOB, REF_IMG BLOB );
In AL32UTF8, 3 bytes are required for 1 Korean character, so the length of the two columns is increased sufficiently.
Data files in Excel format can be downloaded from the github path below.
* Excel data file: https://github.com/DAToolset/Oracle-CharacterSet-Conversion/raw/main/Number of passengers by subway station_Seoul Open Data_2013.xlsx
The DML for entering data into the table can be downloaded from the github path below.
Some excerpts from the above sql file are pasted below.
INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT, COMMT, REF_DES, REF_IMG) VALUES('201301', '1호선', '0150', '서울역', 2199181, 1855268, '20130723', RPAD('서울역', 3000, '서울역'), RPAD('서울역', 3000, '서울역'), UTL_RAW.CAST_TO_RAW('12345')); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '1호선', '0151', '시청', 737005, 726892, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '1호선', '0152', '종각', 1542710, 1446464, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '1호선', '0153', '종로3가', 1159253, 1126995, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '1호선', '0154', '종로5가', 816307, 820769, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '1호선', '0155', '동대문', 446527, 512163, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '1호선', '0156', '신설동', 496150, 484583, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '1호선', '0157', '제기동', 587740, 601913, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '1호선', '0158', '청량리(지하)', 966341, 985741, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '1호선', '0159', '동묘앞', 240864, 252523, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '2호선', '0202', '을지로입구', 1610744, 1607133, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '2호선', '0203', '을지로3가', 539622, 559866, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '2호선', '0204', '을지로4가', 364217, 365994, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '2호선', '0205', '동대문역사문화공원', 435358, 508405, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '2호선', '0206', '신당', 407110, 438543, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '2호선', '0207', '상왕십리', 288629, 274519, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '2호선', '0208', '왕십리', 583444, 498091, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '2호선', '0209', '한양대', 262492, 286399, '20130723'); INSERT INTO SUB_MON_STAT(USE_MON, LINE_NUM, SUB_STA_ID, SUB_STA_NM, RIDE_PASGR_NUM, ALIGHT_PASGR_NUM, WORK_DT) VALUES('201301', '2호선', '0210', '뚝섬', 402631, 420166, '20130723'); ...
So far, we have looked at test environment configuration and configuration scripts for Oracle Character Set conversion. Next, we'll look at different ways to transform the Character Set in this test environment. We will look into the method recommended by Oracle and the method that requires a separate implementation.