Full Contents of Word Extraction Tool Description , Download
I wrote an explanation of the word extraction tool in six parts and posted them all on my blog. Write the entire table of contents of the word extraction tool description and post it as a separate article.
- 3. Run the word extraction tool
- 3.1. Download word extraction tool
- 3.2. How to Run the Word Extraction Tool
- 3.2.1. Unzip the downloaded file and activate the Python virtual environment
- 3.2.2. Check Help
- 3.2.3. Method 1 Extract words only from document files
- 3.2.4. Method 2 Extract words only from DB Table, Column comments
- 3.2.5. Execution Method 3 Extract words from all document files, DB Table, and Column comments
- 3.2.6. How to check execution results
- 3.2.7. Precautions/Notes on Execution
- 4. Word extraction tool source code
- 4.1. summary
- 4.2. main function
- 4.2.1. argument parsing
- 4.2.2. Extract list of files to process
- 4.2.3. Execute get_file_text with multi processing
- 4.2.4. Execute get_word_list with multi processing
- 4.2.5. Get word frequency and run make_word_cloud
- 4.2.6. Save the extracted word list and word frequency as an Excel file, print the execution time, and exit
- 4.3. get_file_text function
- 4.4. get_word_list function
- 4.5. make_word_cloud function
The word extraction tool can be downloaded from the github repository below.
https://github.com/DAToolset/ToolsForDataStandard/tree/main/WordExtractor
Source codes, fonts, table/column list example files, and output example files necessary for execution are bundled into a compressed file for distribution, so you can download this file.
https://github.com/DAToolset/ToolsForDataStandard/raw/main/WordExtractor/word_extractor.7z
I hope this will be of some help in data standardization.