Word Extraction Tool v0.42 release: Bug fix
There was a bug in the word extraction tool v0.41 that was released last time. Distribute word extraction tool v0.42 that fixes the bug that causes KeyError: “Column(s) ['DBSchema'] do not exist” error.
Related articles: Release Word Extraction Tool v0.41: Add DBSchema occurrence frequency of words item
Kim Ki-young reported the bug with the following comment.
hello!
When using the method of extracting words from a file without a DB comment, which is one of the three execution methods
(python word_extractor.py –in_path .\in –out_path .\out)txt, word, ppt all
miniconda3\envs\wordextr\lib\site-packages\pandas\core\apply.py”, line 601, in normalize_dictlike_arg raise KeyError(f”Column(s) {cols_sorted} do not exist”)
KeyError: “Column(s) ['DBSchema'] do not exist”
It is exiting with an error.
Execution methods 2 and 3, where the DB comment file is entered, are working without errors.
I put 'DBSchema': [db_schema] on line 97, but this time
In get_grouper raise KeyError(gpr) KeyError: 'Word' error is displayed.
thank you
The changed code is as follows.
if 'DB' in df_result.columns: df_group = df_result.groupby('Word').agg({ 'Word': 'count', 'Source': lambda x: '\n'.join(list(x)[:10]), 'DBSchema': 'nunique' }).rename(columns={ 'Word': 'Freq', 'Source': 'Source', 'DBSchema': 'DBSchema_Freq' }) else: df_result['DB'] = '' df_result['Schema'] = '' df_result['Table'] = '' df_result['Column'] = '' df_result['DBSchema'] = '' df_group = df_result.groupby('Word').agg({ 'Word': 'count', 'Source': lambda x: '\n'.join(list(x)[:10]) }).rename(columns={ 'Word': 'Freq', 'Source': 'Source' })
The case where 'DB' exists and does not exist in the column list is divided into processing.
The entire source code of Word Extraction Tool v0.42 can be found at the following URL.
https://github.com/DAToolset/ToolsForDataStandard/blob/main/WordExtractor/word_extractor.py