Word Extraction Tool v0.42 release: Bug fix

There was a bug in the word extraction tool v0.41 that was released last time. Distribute word extraction tool v0.42 that fixes the bug that causes KeyError: “Column(s) ['DBSchema'] do not exist” error.

Related articles: Release Word Extraction Tool v0.41: Add DBSchema occurrence frequency of words item

Kim Ki-young reported the bug with the following comment.

단어 추출 도구 v0.41 버그 내용KeyError: "Column(s) ['DBSchema'] do not exist"
Word Extraction Tool v0.41 bug content
KeyError: “Column(s) ['DBSchema'] do not exist”

hello!

When using the method of extracting words from a file without a DB comment, which is one of the three execution methods
(python word_extractor.py –in_path .\in –out_path .\out)

txt, word, ppt all

miniconda3\envs\wordextr\lib\site-packages\pandas\core\apply.py”, line 601, in normalize_dictlike_arg raise KeyError(f”Column(s) {cols_sorted} do not exist”)

KeyError: “Column(s) ['DBSchema'] do not exist”

It is exiting with an error.

Execution methods 2 and 3, where the DB comment file is entered, are working without errors.

I put 'DBSchema': [db_schema] on line 97, but this time

In get_grouper raise KeyError(gpr) KeyError: 'Word' error is displayed.

thank you

The changed code is as follows.

    if 'DB' in df_result.columns:
        df_group = df_result.groupby('Word').agg({
            'Word': 'count',
            'Source': lambda x: '\n'.join(list(x)[:10]),
            'DBSchema': 'nunique'
        }).rename(columns={
            'Word': 'Freq',
            'Source': 'Source',
            'DBSchema': 'DBSchema_Freq'
        })
    else:
        df_result['DB'] = ''
        df_result['Schema'] = ''
        df_result['Table'] = ''
        df_result['Column'] = ''
        df_result['DBSchema'] = ''

        df_group = df_result.groupby('Word').agg({
            'Word': 'count',
            'Source': lambda x: '\n'.join(list(x)[:10])
        }).rename(columns={
            'Word': 'Freq',
            'Source': 'Source'
        })

The case where 'DB' exists and does not exist in the column list is divided into processing.

The entire source code of Word Extraction Tool v0.42 can be found at the following URL.

https://github.com/DAToolset/ToolsForDataStandard/blob/main/WordExtractor/word_extractor.py

Leave a Reply

Your email address will not be published. Required fields are marked *

en_USEnglish