Word Extraction Tool v0.42 release: Bug fix

Published Feb 24, 2023 · Updated Feb 24, 2023

Time For Change
source: https://pixabay.com/images/id-3842467/

There was a bug in the word extraction tool v0.41 that was released last time. Distribute word extraction tool v0.42 that fixes the bug that causes KeyError: “Column(s) ['DBSchema'] do not exist” error.

Kim Ki-young reported the bug with the following comment.

단어 추출 도구 v0.41 버그 내용KeyError: "Column(s) ['DBSchema'] do not exist" — Word Extraction Tool v0.41 bug content
KeyError: “Column(s) ['DBSchema'] do not exist”

hello!

When using the method of extracting words from a file without a DB comment, which is one of the three execution methods
(python word_extractor.py –in_path .\in –out_path .\out)

txt, word, ppt all

miniconda3\envs\wordextr\lib\site-packages\pandas\core\apply.py”, line 601, in normalize_dictlike_arg raise KeyError(f”Column(s) {cols_sorted} do not exist”)

KeyError: “Column(s) ['DBSchema'] do not exist”

It is exiting with an error.

Execution methods 2 and 3, where the DB comment file is entered, are working without errors.

I put 'DBSchema': [db_schema] on line 97, but this time

In get_grouper raise KeyError(gpr) KeyError: 'Word' error is displayed.

thank you

The changed code is as follows.

    if 'DB' in df_result.columns:
        df_group = df_result.groupby('Word').agg({
            'Word': 'count',
            'Source': lambda x: '\n'.join(list(x)[:10]),
            'DBSchema': 'nunique'
        }).rename(columns={
            'Word': 'Freq',
            'Source': 'Source',
            'DBSchema': 'DBSchema_Freq'
        })
    else:
        df_result['DB'] = ''
        df_result['Schema'] = ''
        df_result['Table'] = ''
        df_result['Column'] = ''
        df_result['DBSchema'] = ''

        df_group = df_result.groupby('Word').agg({
            'Word': 'count',
            'Source': lambda x: '\n'.join(list(x)[:10])
        }).rename(columns={
            'Word': 'Freq',
            'Source': 'Source'
        })

The case where 'DB' exists and does not exist in the column list is divided into processing.

The entire source code of Word Extraction Tool v0.42 can be found at the following URL.

https://github.com/DAToolset/ToolsForDataStandard/blob/main/WordExtractor/word_extractor.py

KSM says:

2025년 07월 04일 at 3:20 pm

As of installation date 2025.07.05, check word extraction according to the version below
– Anaconda3-2025.06-0-Windows-x86_64
– Microsoft Build Tools 2022 pre-installed
– Python: 3.9.6
– numpy: 1.20.3 -> 1.23 (version upgrade required)
– pandas: 1.3.1

- Zerom says:
  
  2025년 07월 10일 at 8:00 pm
  
  I'm glad it works.
  Thank you for leaving a comment.

Word Extraction Tool v0.42 release: Bug fix

2 Responses

Leave a Reply Cancel reply

🔔 Categories

📌 Recent Posts

⭐ Popular posts/comments/tags

Word Extraction Tool v0.42 release: Bug fix

Related articles:

2 Responses

Leave a Reply Cancel reply

🔔 Categories

📌 Recent Posts

⭐ Popular posts/comments/tags