Word Extraction Tool (2): Configure the Word Extraction Tool Execution Environment

The word extraction tool is a tool developed in Python, and prior to execution, an environment configuration process such as installing Python and necessary packages is required. Let's take a look at the configuration of the execution environment of the word extraction tool.

This is a continuation of the previous article.

Word Extraction Tool(1): Overview of Word Extraction Tool

2. Configuration of the word extraction tool execution environment

2.1. Environment configuration overview

2.1.1. Recommendations

It is recommended to install Miniconda rather than Anaconda. Anaconda installs too many packages into the default environment, which makes it large. We recommend using Miniconda as it is small and lightweight to start with.

If Miniconda is not installed, virtualenv installation is recommended. If you install the package in a separate environment isolated from the basic environment, you can avoid problems such as package version conflicts.

If it is judged that there is no problem or if only the word extractor is used, it is okay to use the default environment. This article explains how to use Miniconda on Windows 10 64bit.

2.1.2. Stemmer Selection: Mecab

Mecab was chosen because it was the fastest to execute among open natural language morpheme analyzers and most suited to the purpose of word extraction. To use a morpheme analyzer other than Mecab, you can rewrite the get_word_list() function.

2.1.3. Overall order of environment configuration

  1. Install Miniconda
  2. Creating and activating a virtual environment
  3. Install Python in virtual environment
  4. Install the packages required for the virtual environment (install in the basic environment if the virtual environment is not used)

2.2. Install Miniconda

https://conda.io/en/latest/miniconda.html#windows-installers Select and download the Python version from . The word extraction tool was developed in Python 3.8 and works well in 3.9. Here we will download and install 3.9.

Miniconda Windows Installers version
Miniconda Windows Installers version

Execute the downloaded file (Miniconda3-py39_4.10.3-Windows-x86_64.exe) to proceed with the installation. Click the Next button a few times to complete the installation.

Miniconda 설치 화면
Miniconda installation screen

Subsequent tasks are executed from the Miniconda Prompt. You can run it from the following path.

Start Menu > Anaconda3 (64bit) > Anaconda Prompt (miniconda3)

Miniconda Prompt 실행
Run Miniconda Prompt

2.3. Creating and activating a virtual environment

When you run Miniconda Prompt for the first time, the base environment (base) is activated. (see image above)

Create a separate virtual environment for the word extraction tool.

(base) C:\Users\ymlee>conda create -n wordextr

Activate the created virtual environment with the following command. If the virtual environment name (wordextr) appears in front after executing the command, it is normally activated.

(base) C:\Users\ymlee>conda activate wordextr
(wordextr) C:\Users\ymlee>

2.4. Install Python in virtual environment

Run the following command.

(wordextr) C:\Users\ymlee>conda install python

Something like the following is output:

(wordextr) C:\Users\ymlee>conda install python
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: C:\Users\ymlee\miniconda3\envs\wordextr

  added / updated specs:
    - python


The following NEW packages will be INSTALLED:

  ca-certificates    pkgs/main/win-64::ca-certificates-2021.7.5-haa95532_1
  certifi            pkgs/main/win-64::certifi-2021.5.30-py39haa95532_0
  openssl            pkgs/main/win-64::openssl-1.1.1l-h2bbff1b_0
  pip                pkgs/main/win-64::pip-21.2.4-py38haa95532_0
  python             pkgs/main/win-64::python-3.9.7-h6244533_1
  setuptools         pkgs/main/win-64::setuptools-58.0.4-py39haa95532_0
  sqlite             pkgs/main/win-64::sqlite-3.36.0-h2bbff1b_0
  tzdata             pkgs/main/noarch::tzdata-2021a-h5d7bf9c_0
  vc                 pkgs/main/win-64::vc-14.2-h21ff451_1
  vs2015_runtime     pkgs/main/win-64::vs2015_runtime-14.27.29016-h5e58377_2
  wheel              pkgs/main/noarch::wheel-0.37.0-pyhd3eb1b0_1
  wincertstore       pkgs/main/win-64::wincertstore-0.2-py39h2bbff1b_0


Proceed ([y]/n)?

Just press Enter or type y and press Enter to start the installation. For reference, if you do not want to install it, type n and press Enter.

2.5. Install required packages

Install the necessary packages with the following command: Since wordcloud and eunjeon are not provided by conda, they must be installed with pip.

conda install pywin32
conda install pandas
conda install Jinja2
conda install xlsxwriter
pip install wordcloud
pip install eunjeon

The purpose of each package is as follows.

  • pywin32: Used to open and read MS Word, PowerPoint, and Excel files in OLE automation
  • pandas: used to manage word extraction results in memory and save them to an excel file at the end
  • Jinja2, xlsxwriter: used for ExcelWriter in pandas
  • wordcloud: used to visualize word extraction results
  • eunjeon: using Korean morpheme analyzer Mecab

When installing eunjeon, “Microsoft Visual C++ 14.0 or greater is required.” If an error occurs, download and install 'Microsoft Build Tools 2015 Update 3' among 'Redistributable Packages and Build Tools' from the URL below and try again.

https://visualstudio.microsoft.com/ko/vs/older-downloads/#microsoft-build-tools-2015-update-3

When installing, select “Desktop development using C++” and install. (The screen below is a screen captured after installation and is slightly different from the screen during installation)

Microsoft Build Tools 2015 업데이트 3 설치
Install Microsoft Build Tools 2015 Update 3

After installing “Microsoft Build Tools 2015 Update 3”, install eunjeon with the following command.

pip install eunjeon

If eunjeon installation is complete, you can remove “Microsoft Build Tools 2015 Update 3”.

Run 'Visual Studio Installer' from the start menu, deselect “Desktop development using C++”, and click the “Modify” button on the bottom right to remove it.

Visual Studio Installer 실행
Run Visual Studio Installer
Microsoft Build Tools 2015 업데이트 3 제거
Uninstall Microsoft Build Tools 2015 Update 3

At this point, the configuration of the environment is complete. Next, we will look at how to run the word extraction tool and check the results.


<< List of related articles >>

7 Responses

  1. Avatar photo 김철민 says:

    (wordextr) E:\WordExtractor>python word_extractor.py –in_path .\in –out_path .\out
    I am a beginner using python for the first time. I ran it as above and got the following result. There seems to be something wrong with the route designation, but I'm a novice and can't solve it. I would appreciate your help (the in and out folders have been created correctly).

    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    ————————————————————
    Word Extractor v0.41 start — 2023-11-20 03:13:07.584787
    ##### arguments #####
    multi_process_count: 32
    db_comment_file: None
    in_path: .\in
    out_path: .\out
    ————————————————————
    [2023-11-20 03:13:07.586789] Start Get File List…
    [2023-11-20 03:13:07.586789] Finish Get File List.
    — File List —
    E:\WordExtractor\in\test.txt
    [2023-11-20 03:13:07.588790] Start Get File Text…
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')
    E:\WordExtractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
    usage_description = “””— Description —
    E:\WordExtractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
    parser.add_argument('–in_path', required=False, help='Input file (ppt, doc, txt) path name (e.g. .\in) ')
    E:\WordExtractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
    parser.add_argument('–out_path', required=True, help='Output file (xlsx, png) path name (e.g. .\out)')

    get_txt_text: E:\WordExtractor\in\test.txt
    multiprocessing.pool.RemoteTraceback:
    “””
    Traceback (most recent call last):
    File “C:\ProgramData\miniconda3\envs\wordextr\Lib\multiprocessing\pool.py”, line 125, in worker
    result = (True, func(*args, **kwds))
    ^^^^^^^^^^^^^^^^^^^
    File “C:\ProgramData\miniconda3\envs\wordextr\Lib\multiprocessing\pool.py”, line 48, in mapstar
    return list(map(*args))
    ^^^^^^^^^^^^^^^^
    File “E:\WordExtractor\word_extractor.py”, line 367, in get_file_text
    df_text = get_txt_text(file_name)
    ^^^^^^^^^^^^^^^^^^^^^^^
    File “E:\WordExtractor\word_extractor.py”, line 238, in get_txt_text
    df_text = df_text.append(sr_text, ignore_index=True)
    ^^^^^^^^^^^^^^
    File “C:\ProgramData\miniconda3\envs\wordextr\Lib\site-packages\pandas\core\generic.py”, line 6204, in __getattr__
    return object.__getattribute__(self, name)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'?
    “””

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
    File “E:\WordExtractor\word_extractor.py”, line 559, in
    main()
    File “E:\WordExtractor\word_extractor.py”, line 460, in main
    mp_text_result = pool.map(get_file_text, file_list)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File “C:\ProgramData\miniconda3\envs\wordextr\Lib\multiprocessing\pool.py”, line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File “C:\ProgramData\miniconda3\envs\wordextr\Lib\multiprocessing\pool.py”, line 774, in get
    raise self._value
    AttributeError: 'DataFrame' object has no attribute 'append'

    (wordextr) E:\WordExtractor>

    • Avatar photo Zerom says:

      hello nice to meet you.
      Since this is an error I have not experienced, it is difficult to tell you how to solve it right away.
      Could you please check and let me know the Python version, numpy, and pandas versions?
      I think you need to check because the version is different.

      For reference, the version of the environment I implemented and tested is as follows.
      – Python: 3.9.6 (How to check: python –version)
      – numpy: 1.20.3 (How to check: pip list) (You can also check pandas below at once)
      – pandas: 1.3.1

    • Avatar photo 서희경 says:

      I also had the same error. I ran it according to the versions of the packages you shared and it was successful.

  2. Avatar photo 서희경 says:

    hello. I have a question regarding Anaconda installation. I would like to use a word extraction tool within the company, but since Anaconda is paid, the company recommends using miniforge. Will there be any difference in functionality if I use the word extraction tool after installing miniforge?

    • Avatar photo Zerom says:

      I haven't used miniforge, so I don't know if there will be a functional difference.
      The purpose of installing miniconda was to easily create and manage a virtual environment rather than to facilitate package installation.

      Try this:
      – Use venv or virtualenv instead of miniconda (see: https://richwind.co.kr/193)
      – “2.5. Change “conda install” to “pip install” in the “Install necessary packages” content.

      I hope it goes well.

      • Avatar photo 서희경 says:

        First, I installed miniforge and performed the above process at the Miniforge Prompt, but nothing happened.
        And the 'Microsoft Build Tools 2015 Update 3' you mentioned did not install well, so I installed Microsoft Build Tools 2022 and received eunjeon.

        Now I will try the extraction tool and give you feedback 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

en_USEnglish