Data standard check tool_4.Attachment
This is the appendix of the data standard inspection tool. You can check the processing of the number suffix of the property name, standard word dictionary simulation method, reference information of properties subject to standard inspection, license, and VBA code. This is a continuation of the previous article.
Data Standard Check Tool_3. Standard Check Case Results
4. Attachment of data standard inspection tool
4.1. Processing of numeric suffix in attribute name
If there is a number suffix (postfix number) in the property name, the number is excluded and the standard term dictionary or standard word dictionary is searched. A numeric suffix is added after the result of the found standard term or standard word combination.
In the figure above, the physical name of the attribute “personal targeting 1” is created as “PERSN_TGT_YN1”. The attribute name terminator was normally identified as “whether or not”.
The processing logic for the numeric suffix of the property name is as follows.
- Find and store only the consecutive numbers (a) from the back of the property name to be checked, delete the numbers, and inspect
- As a result of inspection, the number a stored in “standard word logical name combination” and “standard word physical name combination” is appended to the right
4.2. Standard word dictionary simulation method
In order to check whether the word combination for the input property name succeeds when the word physical name is not determined, that is, whether it can be registered as a standard term, a logical name is registered in the “Standard Word Dictionary” sheet, and the physical name is “(” + logical It can be simulated by composing name + “)”. In this case, the “standard word physical name combination” item in the standard check result is displayed in green.
If you click the “Add to Word Dictionary” button in the standard check result, the physical name of each word is created as “(” + logical name + “)” for all “additional candidate words” and added to the word dictionary. Words that have already been registered are not added.
Incidentally, if you want to automatically generate an abbreviation for the physical name of a word, you can use the user function “Generate Abbreviation” included in this VBA project. The source code of the function is as follows.
Public Function 약어생성(aFullName As String, Optional aMaxSize = 4) As String Dim sFullName As String, sAbbName As String Dim i As Integer '영문 Full Name 중에 포함된 '-' 는 공백으로 변경하여 처리한다. sFullName = Trim(Replace(aFullName, "-", " ")) '/* 영문 Full Name이 숫자로 시작하는 경우 단어 순서를 변경한다. (첫번째, 두번째 단어 순서만 변경) */ Select Case Mid(sFullName, 1, 1) Case "0" To "9" sFullName = Trim(단어순서변경(sFullName)) End Select sFullName = StrConv(sFullName, vbProperCase) sFullName = Replace(sFullName, "A ", "") sFullName = Replace(sFullName, "An ", "") sFullName = Replace(sFullName, "The ", "") If InStr(1, sFullName, " ") = 0 Then '/* 영문FullName이 하나의 단어일 경우*/ If Len(sFullName) <= 4 Then '/* 영문FullName의 길이가 4 이하 일 경우 */ sAbbName = sFullName Else '/* 영문FullName의 4자 이후 단어들이 모두 모음인 경우 - 앞 4자리 단어를 축약어로 사용 */ If 모음제거(Mid(sFullName, 5)) = "" Then sAbbName = Mid(sFullName, 1, 4) Else sAbbName = Mid(sFullName, 1, 1) + 모음제거(Mid(sFullName, 2)) '-- 모음시작 단어 첫모음 살림 '/* 축약 결과 길이제한 : 최대 aMaxSize 자리까지 */ If Len(sAbbName) > aMaxSize Then sAbbName = Mid(sAbbName, 1, aMaxSize) End If End If End If Else '/* 영문FullName이 여러 단어로 구성되어 있을 경우 */ Dim iBlankCnt As Integer iBlankCnt = Len(sFullName) - Len(Replace(sFullName, " ", "")) '각 단어 첫글자 sAbbName = sFullName Dim iPos As Integer, sChar As String, sAbbTmp As String If iBlankCnt = 1 Then '/* 단어수가 2개인 경우 - 각 단어의 시작 2글자 조합으로 생성 */ sAbbName = 모음제거(sAbbName) i = InStr(1, sAbbName, " ") + 1 sAbbName = UCase(Left(sAbbName, 2) + Mid(sAbbName, i, 2)) ElseIf iBlankCnt <= 4 Then '/* 단어수가 4개 이하인 경우 - 첫글자 조합으로 생성 */ For iPos = 1 To Len(sAbbName) sChar = Mid(sAbbName, iPos, 1) If (sChar >= "A" And sChar <= "Z") Or (sChar >= "0" And sChar <= "9") Then sAbbTmp = sAbbTmp + sChar End If If Len(sAbbTmp) >= aMaxSize Then GoTo Exit_For1 Next iPos Exit_For1: sAbbName = sAbbTmp Else '/* 단어의 길이가 최대 허용자리수 이하인 경우 - 그대로 사용 */ If Len(Replace(sFullName, " ", "")) <= aMaxSize Then sAbbName = UCase(Replace(sFullName, " ", "")) Else Dim sInText As String sInText = 등위접속사정리(UCase(sFullName)) '/* 등위접속사가 존재하면 첫글자들로 구성 */ If sInText <> sFullName Then sAbbName = 모음제거(sInText) For iPos = 1 To Len(sAbbName) sChar = Mid(sAbbName, iPos, 1) If (sChar >= "A" And sChar <= "Z") Or (sChar >= "0" And sChar <= "9") Then sAbbTmp = sAbbTmp + sChar End If If Len(sAbbTmp) >= aMaxSize Then GoTo Exit_For2 Next iPos Exit_For2: sAbbName = sAbbTmp Else '/* 두단어로 구성된 경우, 각각의 단어에서 앞 2자리 문자를 가지고 약어 생성 */ Dim iUnitLen As Integer, aWordArray() As String, sWord As String iUnitLen = 2 aWordArray = Split(sFullName) For iPos = 0 To UBound(aWordArray) sWord = aWordArray(iPos) If Len(sWord) = iUnitLen Then sAbbTmp = sAbbTmp + sWord Else If 모음제거(Mid(sWord, iUnitLen + 1)) = "" Then sAbbTmp = sAbbTmp + Mid(sWord, 1, iUnitLen) Else sAbbTmp = sAbbTmp + Mid(Mid(sWord, 1, 1) + 모음제거(이중자음정리(Mid(sWord, 2))), 1, iUnitLen) End If End If Next sAbbName = sAbbTmp End If End If End If End If 약어생성 = UCase(sAbbName) End Function
4.3. Description of reference information of standard inspection target properties
Property names are changed or data types/lengths are changed during standardization. Items to manage reference information such as property name before change, column name, data type/length, or notes during standardization work are needed.
On the right side of the “Property name standardization” sheet standard inspection results, items are created to manage additional information on standard inspection results and additional reference information required for each property name. For reference information, the user can add necessary items.
▼ Standards Maintenance Additional Information
- Physical Name Length: The length of the physical name created as a result of standard inspection. Displays the background color of cells exceeding the maximum length (e.g. 26) in red in conditional formatting according to the physical name maximum length option setting.
- Number of words: Number of combined words of physical names created as a result of standard inspection
- _Number: The number of '_' characters in the physical name created as a result of standard inspection
- Attribute Name CNT: The number of entered attribute names (if there is a duplicated attribute name, that is, if the attribute name CNT is 2 or more, the background color of the cell is displayed in red)
▼ Additional reference information
- Property name before change: The property name initially entered before standard inspection of property names.
- It is not automatically set, but after creating a list of property names, the user directly sets by copying and pasting.
- It is used to check what the property name was originally entered when changing the property name while executing the standard check several times, and it is recommended to use it whenever possible.
- Property name after change: Current property name (automatically set as a formula)
- As-Is property name: As-Is property name of the entered property name (required for As-Is tracking)
- As-Is column name: As-Is column name of the entered property name (required for As-Is tracking)
- As-Is DataType: As-Is Data Type of the entered property name (required for As-Is tracking)
- Whether or not the property name has been changed: Compare the property name before and after the change, and mark “O” if it has changed
- Items to be confirmed: Freely record and utilize items that need to be checked during standard inspection work
- Others (add as needed): Add a column to the right of “Necessary to check” to manage required items
4.4. license
There are no restrictions on users who need and want to use this standard inspection tool. The content of the license was written briefly by applying the BSD-2 license.
- Author: Youngmin Lee
- email: leeyoungmin@gmail.com
- blog: https://prodskill.com/
- github: https://github.com/DAToolset
- License Content: Basically follows the BSD-2 license specified in the github repository. If not specified, follow the instructions below
- 1. Representation of the copyright holder: The name of the copyright holder, email address, blog address, and github repository address must be used without change. In case of changing the source code and using it, you can add content related to the secondary author.
- 2. Warranty Disclaimer: The copyright holder cannot be held liable for any problems caused by using this source code and tools.
- If the above conditions are satisfied, it is permitted for all purposes, such as personal use such as learning/utilization, use of this tool in business at work, or use of the tool to be included in a commercial product by changing the source code.
- This license content cannot be removed.
- tell me
- You do not have to disclose the source code of the derivative work. However, if it is not a problem, I would like you to let me know which secondary work you used it for. You can send it by email or leave it as a blog comment.
- Please also notify us of errors or improvement requests in the tool's functionality, errors in the guide documentation, or requests for improvement by e-mail or blog comments.
For your reference, the license of the open tool is announced in the blog post below.
https://prodtool.tistory.com/notice/19
#ToDo: Change Link
4.5. Organize your VBA code
WorkSheet, UserForm, Module, Class list. The VBA code of this tool is composed of Module and Class to improve reusability and maintainability. According to the MVC design pattern, M (Model) is Class, V (View) is Excel UI, and C (Control) is Module was applied.
▼ Worksheet Object
- Program Sheet: shtRun (property name standardization): Sheet that manages the function and property list for property name standard inspection
- Data Sheet: Sheet referenced by the program: standard word dictionary, standard term dictionary, standard domain dictionary
▼ Form Object
- frmAbout: “About… ” Form that is displayed when clicking the button. Contains instructional text
▼ Module Object
- modControlStd: Functions related to standard inspection, such as creating standard inspection results and adding candidate words
- modTest: Test code
- modUtil: Common functions (logging, formatting, suffix removal, etc.)
- Mod abbreviation generation: Create an abbreviation by inputting the English full name of the word and the maximum number of digits of the abbreviation
▼ Class Module Object
- CDBConnectionUnit: Class that provides functions such as DB connection, SQL execution and result creation to update standard words/terms/domains from standard Repository DB
- CStdDomain: Class of one standard domain unit
- CStdDomainDic
- Class for searching the standard domain list as a Dictionary structure (Key-Value)
- Key: domain classification name
- Value: Domain Collection of the corresponding domain taxonomy
- CStdTerm: class of one standard term
- CStdTermDic
- Class for searching standard term list as Dictionary structure (Key-Value)
- Key: term logic name
- Value: an instance of the CStdTerm class
- Read the standard terminology dictionary and load it into memory as a dictionary structure
- CStdWord: Class of one standard word
- CStdWordCol: Class for managing a list of homophones for a standard word
- CStdWordDic
- Class for searching standard word list as Dictionary structure (Key-Value)
- Key: word logical name
- Value: instance collection of CStdWord class (CStdWordCol object)
- Internally manages additional dictionary with physical word name as key
- In the process of reading the standard word dictionary and loading it into memory as a dictionary, duplicate logical names and duplicate physical names are checked.
4.6. Main VBA Class Description and Overall Logic Flow Summary
The main classes are “CStdWordDic” that manages the standard word dictionary, “CStdTermDic” that manages the standard term dictionary, and “CStdDomainDic” that manages the standard domain dictionary. The entire logic is handled in the “standard check” procedure of the “modControlStd” module.
In summary, it is as follows.
4.6.1. CStdWordDic Class
CStdWordDic class manages the list of each standard word by using the member variable of Dictionary type. Key is word name (word logic name), Value has one word object (CStdWord type) in general case. When duplicates are allowed for logical names of words, that is, when homonyms are allowed, Value is a Collection object that holds multiple word objects (CStdWord type). Internally, it additionally manages a dictionary with the physical name of the word as the key and the word object as the value.
In order to determine whether there is a homophone, whether to classify by the type of the Value object, the Value object to be the same as a Collection, and to distinguish whether the item count of the Collection is 1 or 2 or more, It was classified by type.
So, in the “standard check” procedure of the “modControlStd” module, the following type of code appears in several places.
If TypeOf oStdWordObj Is CStdWord Then '논리명이 유일한 경우 Set oStdWord = oStdWordObj sToken = oStdWord.m_s단어논리명 ... ElseIf TypeOf oStdWordObj Is CStdWordCol Then '동음이의어가 있어 논리명이 2개 이상인 경우 b동음이의어Matched = True ... End If
Please refer to the following about the Dictionary type.
- The Dictionary type of Excel VBA has a HashMap structure composed of Key and Value.
- It is not a basic type, and it is used after referring to the “Microsoft Scripting Runtime” library, or declared as an object type and created with CreateObject (“Scripting.Dictionary”) to create an instance.
4.6.2. CStdTermDic Class
CStdTermDic class manages the list of each standard term using Dictionary type member variable. Key is the logical name of the term, and Value is the term entity (CStdTerm type). It is used for the logic that determines whether term units match during property standard inspection.
4.6.3. CStdDomainDic Class
CStdDomainDic class manages the list of each standard domain using Dictionary type member variable. Key is domain classification name, Value is domain collection. The items of the domain collection are CStdDomain type domain objects.
4.6.4. Overall logic flow summary
The entire logic is implemented in the "Standard Check" procedure of the module modControl. A brief summary of the process is as follows.
- Read data from each standard word/term/domain dictionary sheet and create variables of each class type. It is implemented in the Load() procedure of each class type.
- Reads the list of properties to be checked and stores them in Variant array.
- For each property to be checked, it is searched using the standard dictionary key-value structure, and the inspection result is stored in a Variant type array. When processing is finished, the inspection results stored in Variant type are output on the sheet.
This concludes the description of the data standard checking tool. If you have any questions, please leave them in the comments.
The data standards check tool can be found in the github repository below.
https://github.com/DAToolset/ToolsForDataStandard
<< List of related articles >>
- Data standard inspection tool_1.Overview
- Data standard inspection tool_2.1. Screen composition, 2.2. Standard inspection function
- Data standard inspection tool_2.3. Data standard dictionary composition
- Data Standard Check Tool_3. Standard Check Case Results
- Data standard check tool_4.Attachment
- How to Fix Data Standard Check Tool_v1.33 Error “Not enough memory”
- Data Standard Check Tool Description Contents , Download