단어 추출 도구

단어 추출 도구 v0.42 배포: Bug fix

게시됨 2023년 02월 24일 · 업데이트됨 2023년 02월 24일

Time For Change
출처: https://pixabay.com/images/id-3842467/

지난 번 배포한 단어 추출 도구 v0.41에 버그가 있었다. KeyError: “Column(s) [‘DBSchema’] do not exist” 오류가 발생하는 버그를 수정한 단어 추출 도구 v0.42 를 배포한다.

김기영님이 다음과 같은 댓글로 버그를 알려왔다.

단어 추출 도구 v0.41 버그 내용
KeyError: "Column(s) ['DBSchema'] do not exist" — 단어 추출 도구 v0.41 버그 내용
KeyError: “Column(s) [‘DBSchema’] do not exist”

안녕하세요!

3가지 실행 방법 중 1번인 DB comment 없이 File에서 단어를 추출하는 방식을 사용했을 때
(python word_extractor.py –in_path .\in –out_path .\out)

txt, word, ppt 모두

miniconda3\envs\wordextr\lib\site-packages\pandas\core\apply.py”, line 601, in normalize_dictlike_arg raise KeyError(f”Column(s) {cols_sorted} do not exist”)

KeyError: “Column(s) [‘DBSchema’] do not exist”

에러가 발생하면서 종료되고 있습니다.

DB comment 파일이 들어가는 2번, 3번 실행 방법은 에러 없이 작동하고 있습니다.

97번 라인에 ‘DBSchema’: [db_schema] 를 넣어보았는데 이번엔

in get_grouper raise KeyError(gpr) KeyError: ‘Word’ 라는 에러가 뜬 상황입니다.

감사합니다.

변경한 코드는 다음과 같다.

    if 'DB' in df_result.columns:
        df_group = df_result.groupby('Word').agg({
            'Word': 'count',
            'Source': lambda x: '\n'.join(list(x)[:10]),
            'DBSchema': 'nunique'
        }).rename(columns={
            'Word': 'Freq',
            'Source': 'Source',
            'DBSchema': 'DBSchema_Freq'
        })
    else:
        df_result['DB'] = ''
        df_result['Schema'] = ''
        df_result['Table'] = ''
        df_result['Column'] = ''
        df_result['DBSchema'] = ''

        df_group = df_result.groupby('Word').agg({
            'Word': 'count',
            'Source': lambda x: '\n'.join(list(x)[:10])
        }).rename(columns={
            'Word': 'Freq',
            'Source': 'Source'
        })

column 목록에 ‘DB’가 있을 때와 없을 때를 나누어 처리하도록 했다.

단어 추출 도구 v0.42 소스코드 전체는 다음 URL에서 확인할 수 있다.

https://github.com/DAToolset/ToolsForDataStandard/blob/main/WordExtractor/word_extractor.py

태그: python 단어 추출 word-extractor

KSM 댓글:

2025년 07월 04일, 3:20 오후

2025.07.05 설치 기준, 아래 버전대로 단어 추출 확인
– Anaconda3-2025.06-0-Windows-x86_64
– Microsoft Build Tools 2022 사전설치
– Python: 3.9.6
– numpy: 1.20.3 -> 1.23 (버전업 필요)
– pandas: 1.3.1

응답
- Zerom 댓글:
  
  2025년 07월 10일, 8:00 오후
  
  잘 동작하니 다행입니다.
  댓글 남겨주셔서 감사합니다.
  
  응답

단어 추출 도구 v0.42 배포: Bug fix

2 Responses

답글 남기기 응답 취소

🔔 카테고리

📌 최근 글

⭐ 인기글/댓글/태그

단어 추출 도구 v0.42 배포: Bug fix

관련 글:

2 Responses

답글 남기기 응답 취소

🔔 카테고리

📌 최근 글

⭐ 인기글/댓글/태그