単語抽出ツール

単語抽出ツール（3）：単語抽出ツールの実行方法と結果を確認する方法

公開済み 2022年09月25日・更新済み 2022年10月10日

単語抽出ツールの実行方法と結果の確認方法について説明します。

前の記事で続く内容だ。

単語抽出ツール（2）：単語抽出ツール実行環境の構成

<<目次>>

3. 単語抽出ツールの実行

3.1。単語抽出ツールのダウンロード

単語抽出ツールはgithubにアップロードしました。

https://github.com/DAToolset/ToolsForDataStandard/tree/main/WordExtractor

実行に必要なソースコード、フォント、table/column リスト例ファイル、出力例ファイルを配布用の圧縮ファイルにまとめておいたので、このファイルをダウンロードすればよい。

https://github.com/DAToolset/ToolsForDataStandard/raw/main/WordExtractor/word_extractor.7z

この配布圧縮ファイルには、次のファイルが含まれています。

[font]
  - NanumBarunGothic.ttf
  - NanumSquareR.ttf
[out]
  - extract_result_20210829111836.xlsx
  - wordcloud_20210829111836.png
- table,column comments.xlsx
- word_extractor.py

各フォルダ、ファイルの説明は次のとおりです。

[font]
- WordCloudを作成するときに必要なフォントを含むフォルダ
- 必要に応じて他のフォントを追加してソースコードを変更することで利用可能
- ソースコードで変更する関数: make_word_cloud
[out]
- 単語抽出結果のサンプルファイルを含むフォルダ
- このファイルの内容は以下の記事で確認できます。
  1.3.3。単語抽出ツール出力データ
table, column comments.xlsx
- DBテーブル、列コメント入力例ファイル
- このファイルの内容は以下の記事で確認できます。
  1.3.1._単語_抽出_ツール_入力_資料
word_extractor.py：単語抽出ツールのソースコード（Python）
- 注意：このソースコードファイルは変更される可能性があるため、最新バージョンでは配布圧縮ファイルではなくgithubファイルを確認してください。

3.2。単語抽出ツールの実行方法

3.2.1.ダウンロードファイルを解凍し、Python仮想環境を有効にする

上記でダウンロードした配布圧縮ファイルを適切なパスに解放します。 (例: “d:\Project\WordExtractor”)

Miniconda Promptを実行して解凍したパスに移動し、Python仮想環境を有効にします。

Python仮想環境の有効化については、次の記事を参照してください。

2.3。仮想環境の作成、有効化

次のようなMiniconda Prompt状態で進行します。

(wordextr) d:\Project\WordExtractor>

3.2.2。ヘルプを確認

“–help” argumentを指定して実行すると、ヘルプを確認できます。

(wordextr) d:\Project\WordExtractor>python word_extractor.py --help

実行時に出力される内容は次のとおりです。

(wordextr) d:\Project\WordExtractor>python word_extractor.py --help
usage: word_extractor.py [-h] [--multi_process_count MULTI_PROCESS_COUNT] [--db_comment_file DB_COMMENT_FILE] [--in_path IN_PATH] --out_path OUT_PATH

--- Description ---
  * db_comment_file과 in_path중 하나는 필수로 입력

  * 실행 예시
    1. File에서 text, 단어 추출: in_path, out_path 지정
       python word_extractor.py --multi_process_count 4 --in_path .\test_files --out_path .\out

    2. DB comment에서 text, 단어 추출: db_comment_file, out_path 지정
       python word_extractor.py --db_comment_file "table,column comments.xlsx" --out_path .\out

    3. File, DB comment 에서 text, 단어 추출: db_comment_file, in_path, out_path 지정
       python word_extractor.py --db_comment_file "table,column comments.xlsx" --in_path .\test_files --out_path .\out

  * DB Table, Column comment 파일 형식
    - 첫번째 sheet(Table comment): DBName, SchemaName, Tablename, TableComment
    - 두번째 sheet(Column comment): DBName, SchemaName, Tablename, ColumnName, ColumnComment

optional arguments:
  -h, --help            show this help message and exit
  --multi_process_count MULTI_PROCESS_COUNT
                        text 추출, 단어 추출을 동시에 실행할 multi process 개수(지정하지 않으면 (logical)cpu 개수로 설정됨)
  --db_comment_file DB_COMMENT_FILE
                        DB Table, Column comment 정보 파일명(예: comment.xlsx)
  --in_path IN_PATH     입력파일(ppt, doc, txt) 경로명(예: .\in)
  --out_path OUT_PATH   출력파일(xlsx, png) 경로명(예: .\out)

実行方法は3つあります。（上記のヘルプの「*実行例」の内容を参照）

文書ファイルからのみ単語を抽出する
- MS Word、PowerPoint、Textファイルが保存されているフォルダを「–in_path」、結果を出力するフォルダを「–out_path」として指定
DB Table、Column commentからのみ単語を抽出する
- comment ファイル形式で保存されたExcelファイルを“–db_comment_file”に指定し、結果を出力するフォルダを“–out_path”に指定
ドキュメントファイルとDBテーブル、列コメントの両方から単語を抽出する（1と2の両方で一度に抽出する方法）
- 「–in_path」、「–db_comment_file」、「–out_path」すべて指定

「-multi_process_count」 argumentは、ファイルからテキストを抽出し、そのテキストから単語を抽出するときに並列に同時に実行するプロセスの数です。実行環境に応じて適正な数値を指定すると性能が向上することがある。

この記事では、 “–multi_process_count” argumentは指定せずに実行します。この場合、コード実行過程で実行環境のlogical cpu数に設定される。（例：i5-8250U CPUの場合8）

3.2.3。実行方法1：文書ファイルからのみ単語を抽出する

まず、ドキュメントファイルを保存するフォルダをPythonソースコードのあるパスの下に作成します。

たとえば、「in」フォルダを「d：¥Project¥WordExtractor」子に作成して、「d：¥Project¥WordExtractor¥in」パスを作成します。

そして、「in」フォルダにMS Word、PowerPoint、Text形式のファイルをコピーする。「in」フォルダ下位に複数階層のフォルダがあってもすべて探索して処理することができるので、業務単位などでサブフォルダを構成しておくとよい。

ちなみに、この記事を作成する現在の時点（2021-10-24）では、HWP、PDFファイルはまだサポートされていません。

次のような命令で実行する。 (–in_path, –out_path 指定)

(wordextr) d:\Project\WordExtractor>python word_extractor.py --in_path .\in --out_path .\out

実行結果の例は次のとおりです。

(wordextr) d:\Project\WordExtractor>python word_extractor.py --in_path .\in --out_path .\out
------------------------------------------------------------
Word Extractor v0.40 start --- 2021-10-24 12:15:11.985581
##### arguments #####
multi_process_count: 8
db_comment_file: None
in_path: .\in
out_path: .\out
------------------------------------------------------------
[2021-10-24 12:15:11.985581] Start Get File List...
[2021-10-24 12:15:11.985581] Finish Get File List.
--- File List ---
d:\Project\WordExtractor\in\OOOOOO_데이터현황.txt
d:\Project\WordExtractor\in\OOOOOO_업무매뉴얼.pptx
d:\Project\WordExtractor\in\OOOOOO_주간업무보고서(7주차).docx
[2021-10-24 12:15:11.985581] Start Get File Text...

get_txt_text: d:\Project\WordExtractor\in\OOOOOO_데이터현황.txt

get_ppt_text: d:\Project\WordExtractor\in\OOOOOO_업무매뉴얼.pptx

get_doc_text: d:\Project\WordExtractor\in\OOOOOO_주간업무보고서(7주차).docx
text count: 25
line count: 34
[pid:17976] get_txt_text elapsed time: 0:00:00.135933
text count: 124
page count: 5
[pid:5412] get_ppt_text elapsed time: 0:00:03.370637
text count: 59
page count: 3
[pid:22052] get_doc_text elapsed time: 0:00:04.100849
[2021-10-24 12:15:18.094089] Finish Get File Text.
[2021-10-24 12:15:18.094089] Start Get Word from File Text...
[pid:25016] input text count:26, extracted word count: 31
[pid:25016] get_word_list finished. total: 26, elapsed time: 0:00:00.109351
[pid:17704] input text count:26, extracted word count: 54
[pid:17704] get_word_list finished. total: 26, elapsed time: 0:00:00.156214
[pid:18468] input text count:26, extracted word count: 52
[pid:18468] get_word_list finished. total: 26, elapsed time: 0:00:00.140596
[pid:3456] input text count:26, extracted word count: 38
[pid:3456] get_word_list finished. total: 26, elapsed time: 0:00:00.109350
[pid:15400] input text count:26, extracted word count: 50
[pid:15400] get_word_list finished. total: 26, elapsed time: 0:00:00.140594
[pid:25892] input text count:26, extracted word count: 65
[pid:25892] get_word_list finished. total: 26, elapsed time: 0:00:00.171835
[pid:3592] input text count:26, extracted word count: 147
[pid:3592] get_word_list finished. total: 26, elapsed time: 0:00:00.312458
[pid:9512] input text count:26, extracted word count: 180
[pid:9512] get_word_list finished. total: 26, elapsed time: 0:00:00.374976
[2021-10-24 12:15:20.320614] Finish Get Word from File Text.
[2021-10-24 12:15:20.320614] Start Get Word Frequency...
[2021-10-24 12:15:20.336234] Finish Get Word Frequency.
[2021-10-24 12:15:20.336234] Start Make Word Cloud...

start make_word_cloud...
make_word_cloud elapsed time: 0:00:06.681665
[2021-10-24 12:15:27.017899] Finish Make Word Cloud.
[2021-10-24 12:15:27.017899] Start Save the Extract result to Excel File...
start writing excel file...
[2021-10-24 12:15:27.643679] Finish Save the Extract result to Excel File...
------------------------------------------------------------
[2021-10-24 12:15:27.643679] Finished.
overall elapsed time: 0:00:15.658098
------------------------------------------------------------

3.2.4。実行方法2: DB Table, Column commentからのみ単語を抽出する

まず、圧縮ファイルに含まれている「table、column comments.xlsx」ファイルをExcelで開き、フォーマットに合わせて内容を埋めて保存します。

形式と内容の例については、以下の記事を参照してください。

1.3.1。単語抽出ツール入力材料

次のような命令で実行する。 (–db_comment_file, –out_path 指定)

(wordextr) d:\Project\WordExtractor>python word_extractor.py --db_comment_file "table,column comments.xlsx" --out_path .\out

入力ファイル名に空白文字が含まれているため、ファイル名を重複符（“）で囲んだ。

「table、column comments.xlsx」ファイルがPythonソースコードファイルのパスと異なる場合は、そのパスを含めて指定します。ここでは同じ経路にあると仮定した。

実行結果の例は次のとおりです。

(wordextr) d:\Project\WordExtractor>python word_extractor.py --db_comment_file "table,column comments.xlsx" --out_path .\out
------------------------------------------------------------
Word Extractor v0.40 start --- 2021-10-24 12:34:23.369210
##### arguments #####
multi_process_count: 8
db_comment_file: table,column comments.xlsx
in_path: None
out_path: .\out
------------------------------------------------------------
[2021-10-24 12:34:23.370209] Start Get File Text...

get_db_comment_text: table,column comments.xlsx
table_comment_range : A2:D1001 (1000 rows)
column_comment_range : A2:E1001 (1000 rows)
[pid:17088] get_db_comment_text elapsed time: 0:00:01.216618
text count: 1680
[2021-10-24 12:34:26.577237] Finish Get File Text.
[2021-10-24 12:34:26.577237] Start Get Word from File Text...
[pid:25240] current: 100, total: 210, progress: 47.62%
[pid:21792] current: 100, total: 210, progress: 47.62%
[pid:14788] current: 100, total: 210, progress: 47.62%
[pid:10660] current: 100, total: 210, progress: 47.62%
[pid:17208] current: 100, total: 210, progress: 47.62%
[pid:13300] current: 100, total: 210, progress: 47.62%
[pid:23764] current: 100, total: 210, progress: 47.62%
[pid:25068] current: 100, total: 210, progress: 47.62%
[pid:13300] current: 200, total: 210, progress: 95.24%
[pid:14788] current: 200, total: 210, progress: 95.24%
[pid:13300] input text count:210, extracted word count: 804
[pid:13300] get_word_list finished. total: 210, elapsed time: 0:00:02.900049
[pid:10660] current: 200, total: 210, progress: 95.24%
[pid:14788] input text count:210, extracted word count: 850
[pid:14788] get_word_list finished. total: 210, elapsed time: 0:00:03.005057
[pid:10660] input text count:210, extracted word count: 819
[pid:10660] get_word_list finished. total: 210, elapsed time: 0:00:03.040949
[pid:17208] current: 200, total: 210, progress: 95.24%
[pid:25240] current: 200, total: 210, progress: 95.24%
[pid:17208] input text count:210, extracted word count: 929
[pid:17208] get_word_list finished. total: 210, elapsed time: 0:00:03.182333
[pid:25240] input text count:210, extracted word count: 871
[pid:25240] get_word_list finished. total: 210, elapsed time: 0:00:03.320128
[pid:23764] current: 200, total: 210, progress: 95.24%
[pid:21792] current: 200, total: 210, progress: 95.24%
[pid:23764] input text count:210, extracted word count: 1054
[pid:23764] get_word_list finished. total: 210, elapsed time: 0:00:03.362429
[pid:25068] current: 200, total: 210, progress: 95.24%
[pid:21792] input text count:210, extracted word count: 1077
[pid:21792] get_word_list finished. total: 210, elapsed time: 0:00:03.651294
[pid:25068] input text count:210, extracted word count: 1163
[pid:25068] get_word_list finished. total: 210, elapsed time: 0:00:03.616955
[2021-10-24 12:34:32.287245] Finish Get Word from File Text.
[2021-10-24 12:34:32.287245] Start Get Word Frequency...
[2021-10-24 12:34:32.313363] Finish Get Word Frequency.
[2021-10-24 12:34:32.313363] Start Make Word Cloud...

start make_word_cloud...
make_word_cloud elapsed time: 0:00:10.572230
[2021-10-24 12:34:42.886547] Finish Make Word Cloud.
[2021-10-24 12:34:42.886547] Start Save the Extract result to Excel File...
start writing excel file...
[2021-10-24 12:34:48.636633] Finish Save the Extract result to Excel File...
------------------------------------------------------------
[2021-10-24 12:34:48.636633] Finished.
overall elapsed time: 0:00:25.266424
------------------------------------------------------------

3.2.5。実行方法3：文書ファイルとDBテーブル、列コメントの両方から単語を抽出する

実行方法1と2を一度に実行できる方法である。

次のような命令で実行する。 (–db_comment_file, –in_path, –out_path 指定)

(wordextr) d:\Project\WordExtractor>python word_extractor.py --db_comment_file "table,column comments.xlsx" --in_path .\in --out_path .\out

実行結果の例は次のとおりです。

(wordextr) d:\Project\WordExtractor>python word_extractor.py --db_comment_file "table,column comments.xlsx" --in_path .\in --out_path .\out
------------------------------------------------------------
Word Extractor v0.40 start --- 2021-10-24 12:43:31.847674
##### arguments #####
multi_process_count: 8
db_comment_file: table,column comments.xlsx
in_path: .\in
out_path: .\out
------------------------------------------------------------
[2021-10-24 12:43:31.848673] Start Get File List...
[2021-10-24 12:43:31.849672] Finish Get File List.
--- File List ---
d:\Project\WordExtractor\in\OOOOOO_데이터현황.txt
d:\Project\WordExtractor\in\OOOOOO_업무 매뉴얼.pptx
d:\Project\WordExtractor\in\OOOOOO_주간업무보고서(7주차).docx
[2021-10-24 12:43:31.849672] Start Get File Text...

get_txt_text: d:\Project\WordExtractor\in\OOOOOO_데이터현황.txt

get_ppt_text: d:\Project\WordExtractor\in\OOOOOO_업무 매뉴얼.pptx

get_doc_text: d:\Project\WordExtractor\in\OOOOOO_주간업무보고서(7주차).docx

get_db_comment_text: table,column comments.xlsx
text count: 25
line count: 34
[pid:11692] get_txt_text elapsed time: 0:00:00.135359
table_comment_range : A2:D1001 (1000 rows)
column_comment_range : A2:E1001 (1000 rows)
[pid:21044] get_db_comment_text elapsed time: 0:00:01.580088
text count: 1680
text count: 124
page count: 5
[pid:23812] get_ppt_text elapsed time: 0:00:04.757793
text count: 59
page count: 3
[pid:23724] get_doc_text elapsed time: 0:00:06.661778
[2021-10-24 12:43:40.690639] Finish Get File Text.
[2021-10-24 12:43:40.690639] Start Get Word from File Text...
[pid:18392] current: 100, total: 236, progress: 42.37%
[pid:8036] current: 100, total: 236, progress: 42.37%
[pid:26864] current: 100, total: 236, progress: 42.37%
[pid:23288] current: 100, total: 236, progress: 42.37%
[pid:15596] current: 100, total: 236, progress: 42.37%
[pid:8036] current: 200, total: 236, progress: 84.75%
[pid:18208] current: 100, total: 236, progress: 42.37%
[pid:17976] current: 100, total: 236, progress: 42.37%
[pid:4324] current: 100, total: 236, progress: 42.37%
[pid:18392] current: 200, total: 236, progress: 84.75%
[pid:26864] current: 200, total: 236, progress: 84.75%
[pid:8036] input text count:236, extracted word count: 739
[pid:8036] get_word_list finished. total: 236, elapsed time: 0:00:02.651907
[pid:18392] input text count:236, extracted word count: 780
[pid:18392] get_word_list finished. total: 236, elapsed time: 0:00:02.879298
[pid:15596] current: 200, total: 236, progress: 84.75%
[pid:26864] input text count:236, extracted word count: 887
[pid:26864] get_word_list finished. total: 236, elapsed time: 0:00:03.161543
[pid:15596] input text count:236, extracted word count: 979
[pid:15596] get_word_list finished. total: 236, elapsed time: 0:00:03.443786
[pid:18208] current: 200, total: 236, progress: 84.75%
[pid:23288] current: 200, total: 236, progress: 84.75%
[pid:17976] current: 200, total: 236, progress: 84.75%
[pid:18208] input text count:236, extracted word count: 1181
[pid:18208] get_word_list finished. total: 236, elapsed time: 0:00:03.831052
[pid:4324] current: 200, total: 236, progress: 84.75%
[pid:23288] input text count:236, extracted word count: 1242
[pid:23288] get_word_list finished. total: 236, elapsed time: 0:00:04.139228
[pid:17976] input text count:236, extracted word count: 1294
[pid:17976] get_word_list finished. total: 236, elapsed time: 0:00:04.113296
[pid:4324] input text count:236, extracted word count: 1082
[pid:4324] get_word_list finished. total: 236, elapsed time: 0:00:04.334706
[2021-10-24 12:43:47.324098] Finish Get Word from File Text.
[2021-10-24 12:43:47.325098] Start Get Word Frequency...
[2021-10-24 12:43:47.353058] Finish Get Word Frequency.
[2021-10-24 12:43:47.353058] Start Make Word Cloud...

start make_word_cloud...
make_word_cloud elapsed time: 0:00:10.604237
[2021-10-24 12:43:57.958289] Finish Make Word Cloud.
[2021-10-24 12:43:57.958289] Start Save the Extract result to Excel File...
start writing excel file...
[2021-10-24 12:44:04.752046] Finish Save the Extract result to Excel File...
------------------------------------------------------------
[2021-10-24 12:44:04.752046] Finished.
overall elapsed time: 0:00:32.903374
------------------------------------------------------------

上記の実行過程の一部をイメージでキャプチャして貼り付けておく。

3.2.6。実行結果の確認方法

実行時に出力パスで指定したフォルダ(\out)に2つのファイル(xlsx,png)が生成される。ファイル名に年月日時分秒（YYYYMMDDHHMISS）が自動的に指定され、いつ作成されたかを確認できます。

たとえば、配布圧縮ファイルのoutフォルダに含まれている実行結果ファイルは次のようになります。

extract_result_20210829111836.xlsx: 単語抽出結果 Excel ファイル
wordcloud_20210829111836.png: 単語抽出結果エクセルファイル「単語頻度」シートで生成したword cloudイメージファイル

実行結果ファイルの形式と内容については、以下の記事を参照してください。

1.3.3。単語抽出ツール出力データ

3.2.7。実行時の注意/注意事項

実行する前にMS-Word、PowerPointアプリケーションを最初に実行しておくと、実行パフォーマンスが若干向上します。
実行中のMS-Word、PowerPointアプリケーションでファイルが開かれ、処理が終わると閉じる。単語抽出プログラムの実行中は、各アプリケーションを使用せずにそのままにしてください。
ファイルが多いほど、ファイル内にページが多いほど、コメントエクセルファイルのデータ行が多いほど実行時間が長くかかる。
全体を実行する前に、入力ファイルの一部とコメントエクセルファイルのデータの一部のみを別々に保存し、うまく動作するかどうかテストを最初にしてみることをお勧めします。
全体を実行するときは長くかかることを勘案し、食事時間や休み時間に実行するのが良い。

この記事では、単語抽出ツールの使用方法について説明しました。使用上の気になる点や追加されたら良い機能があればコメントで残してほしい。

次の記事はソースコードを見てみましょう。

<< 関連記事のリスト >>

タグ: 形態素分析器 python MeCab 単語抽出 nlp

김기영 より:

2023년 02월 16일 2:12 pm

こんにちは！良いコードありがとうございます。

pptxファイルを入力としてコードを実行するときに質問があります。

.pptxファイルから単語を抽出することをテストしました。
コードの実行時に空のpptウィンドウが終了せず、浮遊現象があります。

word、txt、excelは同様の現象が発生していません。

オフィスバージョンによる一時的な現象かどうかお問い合わせいたします。

使用環境は
windowは11、officeは365バージョン使用しています。

ありがとうございます。

返信
- Zerom より:
  
  2023년 02월 16일 3:03 pm
  
  こんにちは、訪問とコメントありがとうございます。
  
  キム・ギヨン様の状況を具体的に知りたいです。
  次のうちどれですか？
  1) コード実行 “中に” パワーポイントアプリケーションが応答がなく、次のファイルに進まない現象
  2) コード実行 “後” パワーポイントアプリケーションが終了せず開いている現象
  
  もし1）の場合なら、
  – 作業管理者でPowerPointを強制終了すると、次のファイルに進むことができます。
  – 現在ファイルが正しく処理されていない可能性があります。
  - 最初からやり直すことをお勧めします。
  
  もし2）の場合、
  - PowerPointを事前に実行してから単語抽出ツールを実行した場合、PowerPointアプリケーションはまだ実行されているのが正常です。
  - PowerPointを実行していない場合は異常動作ですが、ファイル全体がすべて処理された場合は特に気にする必要はありません。
  
  ちなみに、PowerPoint、Excelなどのアプリケーションを実行していない状態で単語抽出ツールを実行すると、OLE自動化プロセスでPowerPoint、Excelなどのアプリケーションが実行され、処理完了後に終了する必要があります。
  
  それ以外の場合は追加コメントを残してください。
  
  返信
  - 김기영 より:
    
    2023년 02월 16일 3:27 pm
    
    こんにちは速い答えありがとうございます！
    
    私の場合は、2）コードの実行後にPowerPointアプリケーションが終了せずに開いている現象です。
    
    パワーポイントをあらかじめ実行しておいたのではない状態だとおっしゃった通りなら異常動作のようです。
    
    完全なファイルは特別な異常なしですべて処理されました。
    
    ありがとうございます！
    
    返信
    - Zerom より:
      
      2023년 02월 16일 6:01 pm
      
      状況と結果を教えてくれてありがとう。
      よく使ってください〜^^
      
      返信
서희경 より:

2024년 01월 18일 5:40 pm

こんにちは。現在、そのツールを使用していますが、動作していません。 miniforgeの問題なのか。

————————————————————————————————
(wordextr) C:\Users\User\Downloads\word_extractor>python word_extractor.py –in_path .\in –out_path .\out
C:\Users\User\Downloads\word_extractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
usage_description = “””— Description —
C:\Users\User\Downloads\word_extractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
parser.add_argument('–in_path', required=False, help='入力ファイル(ppt, doc, txt) パス名 (例: .\in) ')
C:\Users\User\Downloads\word_extractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
parser.add_argument('–out_path', required=True, help='出力ファイル(xlsx, png) パス名(例: .\out)')
——————————————————————
Word Extractor v0.41 start — 2024-01-18 17:36:57.283006
##### arguments #####
multi_process_count: 16
db_comment_file: None
in_path: .\in
out_path: .\out
——————————————————————
[2024-01-18 17:36:57.288351] Start Get File List…
[2024-01-18 17:36:57.289390] Finish Get File List。
— File List —

[2024-01-18 17:36:57.290381] Start Get File Text…
Traceback (most recent call last):
File “C:\Users\User\Downloads\word_extractor\word_extractor.py”, line 559, in
main()
File “C:\Users\User\Downloads\word_extractor\word_extractor.py”, line 461, in main
df_text = pd.concat(mp_text_result, ignore_index=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\ProgramData\miniforge3\envs\wordextr\Lib\site-packages\pandas\core\reshape\concat.py”, line 380, in concat
op = _Concatenator(
^^^^^^^^^^^^^^
File “C:\ProgramData\miniforge3\envs\wordextr\Lib\site-packages\pandas\core\reshape\concat.py”, line 443, in __init__
objs, keys = self._clean_keys_and_objs(objs, keys)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\ProgramData\miniforge3\envs\wordextr\Lib\site-packages\pandas\core\reshape\concat.py”, line 505, in _clean_keys_and_objs
raise ValueError(“No objects to concatenate”)
ValueError: No objects to concatenate

返信
- 서희경 より:
  
  2024년 01월 18일 6:07 pm
  
  バージョン情報は次のとおりです。
  
  Python 3.12.1
  
  Package Version
  ————— —————
  contourpy 1.2.0
  サイクラー 0.12.1
  eunjeon 0.4.0
  fonttools 4.47.2
  Jinja2 3.1.3
  kiwisolver 1.4.5
  MarkupSafe 2.1.3
  matplotlib 3.8.2
  numpy 1.26.3
  packaging 23.2
  pandas 2.1.4
  pillow 10.2.0
  pip 23.3.2
  pyparsing 3.1.1
  python-dateutil 2.8.2
  pytz 2023.3.post1
  pywin32 306
  setuptools 69.0.3
  six 1.16.0
  tzdata 2023.4
  wheel 0.42.0
  wordcloud 1.9.3
  XlsxWriter 3.1.9
  
  返信
  - Zerom より:
    
    2024년 01월 19일 9:24 am
    
    input directory に以下の拡張子のファイルが 1 つもない場合に発生するエラーと見なされます。
    – .ppt、.pptx
    – .doc、.docx
    – .txt
    
    次のようにテストしてみました。
    —————————————————————————–
    
    (.venv) D:\Temp\python_venv\wordextr_test>python –version
    Python 3.11.1
    
    (.venv) D:\Temp\python_venv\wordextr_test>python word_extractor.py –in_path .\in –out_path .\out
    ——————————————————————
    Word Extractor v0.41 start — 2024-01-19 08:55:44.842578
    ##### arguments #####
    multi_process_count: 8
    db_comment_file: None
    in_path: .\in
    out_path: .\out
    ——————————————————————
    [2024-01-19 08:55:44.845602] Start Get File List…
    [2024-01-19 08:55:44.845602] Finish Get File List.
    — File List —
    
    [2024-01-19 08:55:44.845602] Start Get File Text…
    Traceback (most recent call last):
    File “D:\Temp\python_venv\wordextr_test\word_extractor.py”, line 164, in
    main()
    File “D:\Temp\python_venv\wordextr_test\word_extractor.py”, line 152, in main
    df_text = pd.concat(mp_text_result, ignore_index=True)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File “D:\Temp\python_venv\wordextr_test\.venv\Lib\site-packages\pandas\core\reshape\concat.py”, line 380, in concat
    op = _Concatenator(
    ^^^^^^^^^^^^^^
    File “D:\Temp\python_venv\wordextr_test\.venv\Lib\site-packages\pandas\core\reshape\concat.py”, line 443, in __init__
    objs, keys = self._clean_keys_and_objs(objs, keys)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File “D:\Temp\python_venv\wordextr_test\.venv\Lib\site-packages\pandas\core\reshape\concat.py”, line 505, in _clean_keys_and_objs
    raise ValueError(“No objects to concatenate”)
    ValueError: No objects to concatenate
    
    (.venv) D:\Temp\python_venv\wordextr_test>dir .\in
    Dドライブのボリューム：データ
    ボリューム通し番号：D6EC-7CFE
    
    D:\Temp\python_venv\wordextr_test\in ディレクトリ
    
    2024-01-18 午後09時41分
    .
    2024-01-18 午後09時41分 ..
    0ファイル0バイト
    2 つのディレクトリ 34,305,060,864 バイト残り
    
    返信
Hyelim Cho より:

2024년 04월 29일 2:34 pm

こんにちは。現在このツールを使用していますが、エラーが発生し続けます。探してみるとパンダス版問題のようだが。（パンダス2から.appen機能を提供していないと思います。）それとも別のエラーですか？

(wordextr) C:\Users\hyelm\Documents\word_extractor>python word_extractor.py。 –db_comment_file “table_column_comments1.xlsx” –out_path 'C:\Users\hyelm\Documents\word_extractor\out'
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
usage_description = “””— Description —
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
parser.add_argument('–in_path', required=False, help='入力ファイル(ppt, doc, txt) パス名 (例: .\in) ')
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
parser.add_argument('–out_path', required=True, help='出力ファイル(xlsx, png) パス名(例: .\out)')
——————————————————————
Word Extractor v0.41 start — 2024-04-29 14:28:41.474667
##### arguments #####
multi_process_count: 4
db_comment_file: table_column_comments1.xlsx
in_path: None
out_path: 'C:\Users\hyelm\Documents\word_extractor\out'
——————————————————————
[2024-04-29 14:28:41.479798] Start Get File Text…
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
usage_description = “””— Description —
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
parser.add_argument('–in_path', required=False, help='入力ファイル(ppt, doc, txt) パス名 (例: .\in) ')
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
parser.add_argument('–out_path', required=True, help='出力ファイル(xlsx, png) パス名(例: .\out)')
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
usage_description = “””— Description —
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
parser.add_argument('–in_path', required=False, help='入力ファイル(ppt, doc, txt) パス名 (例: .\in) ')
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
parser.add_argument('–out_path', required=True, help='出力ファイル(xlsx, png) パス名(例: .\out)')
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
usage_description = “””— Description —
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
parser.add_argument('–in_path', required=False, help='入力ファイル(ppt, doc, txt) パス名 (例: .\in) ')
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
parser.add_argument('–out_path', required=True, help='出力ファイル(xlsx, png) パス名(例: .\out)')
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:382: SyntaxWarning: invalid escape sequence '\o'
usage_description = “””— Description —
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:406: SyntaxWarning: invalid escape sequence '\i'
parser.add_argument('–in_path', required=False, help='入力ファイル(ppt, doc, txt) パス名 (例: .\in) ')
C:\Users\hyelm\Documents\word_extractor\word_extractor.py:407: SyntaxWarning: invalid escape sequence '\o'
parser.add_argument('–out_path', required=True, help='出力ファイル(xlsx, png) パス名(例: .\out)')

get_db_comment_text: table_column_comments1.xlsx
table_comment_range : A2:D7112 (7111 rows)
column_comment_range : A2:E181935 (181934 rows)
multiprocessing.pool.RemoteTraceback:
“「」」”
Traceback (most recent call last):
File “C:\Users\hyelm\miniconda3\envs\wordextr\Lib\multiprocessing\pool.py”, line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File “C:\Users\hyelm\miniconda3\envs\wordextr\Lib\multiprocessing\pool.py”, line 48, in mapstar
return list(map(*args))
^^^^^^^^^^^^^^^^
File “C:\Users\hyelm\Documents\word_extractor\word_extractor.py”, line 369, in get_file_text
df_text = get_db_comment_text(file_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\hyelm\Documents\word_extractor\word_extractor.py”, line 343, in get_db_comment_text
df_text = df_column.append(df_table, ignore_index=True)
^^^^^^^^^^^^^^^^
File “C:\Users\hyelm\miniconda3\envs\wordextr\Lib\site-packages\pandas\core\generic.py”, line 6296, in __getattr__
return object.__getattribute__(self, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'?
“「」」”

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “C:\Users\hyelm\Documents\word_extractor\word_extractor.py”, line 559, in
main()
File “C:\Users\hyelm\Documents\word_extractor\word_extractor.py”, line 460, in main
mp_text_result = pool.map(get_file_text, file_list)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\hyelm\miniconda3\envs\wordextr\Lib\multiprocessing\pool.py”, line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\hyelm\miniconda3\envs\wordextr\Lib\multiprocessing\pool.py”, line 774, in get
raise self._value
AttributeError: 'DataFrame' object has no attribute 'append'

返信
조혜림 より:

2024년 04월 29일 2:42 pm

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'?
“「」」”

上記のコメントにコードが奇妙にコピーされ、再び残ります。
これらのエラーが発生するのはパンダスのバージョンの問題ですか？

返信
- Zerom より:
  
  2024년 04월 30일 11:30 am
  
  Pandas v2.0以降、appendはサポートされなくなったため、concatに変更する必要があります。
  
  以下の2つの文書を参照しました。
  https://yunwoong.tistory.com/253
  https://stackoverflow.com/questions/75956209/error-dataframe-object-has-no-attribute-append
  
  私は最近時間の賭けが難しく、いつソースコードを変更できるのかわかりません。
  Pandasのインストール時に以前のバージョンを指定してインストールしてみますか？
  
  よろしくお願いします〜
  
  返信

単語抽出ツール（3）：単語抽出ツールの実行方法と結果を確認する方法

3. 単語抽出ツールの実行

3.1。単語抽出ツールのダウンロード

3.2。単語抽出ツールの実行方法

3.2.1.ダウンロードファイルを解凍し、Python仮想環境を有効にする

3.2.2。ヘルプを確認

3.2.3。実行方法1：文書ファイルからのみ単語を抽出する

3.2.4。実行方法2: DB Table, Column commentからのみ単語を抽出する

3.2.5。実行方法3：文書ファイルとDBテーブル、列コメントの両方から単語を抽出する

3.2.6。実行結果の確認方法

3.2.7。実行時の注意/注意事項

10件のフィードバック

コメントを残すコメントをキャンセル

🔔 カテゴリ

📌 最近の投稿

⭐人気の投稿/コメント/タグ

単語抽出ツール（3）：単語抽出ツールの実行方法と結果を確認する方法

3. 単語抽出ツールの実行

3.1。単語抽出ツールのダウンロード

3.2。単語抽出ツールの実行方法

3.2.1.ダウンロードファイルを解凍し、Python仮想環境を有効にする

3.2.2。ヘルプを確認

3.2.3。実行方法1：文書ファイルからのみ単語を抽出する

3.2.4。実行方法2: DB Table, Column commentからのみ単語を抽出する

3.2.5。実行方法3：文書ファイルとDBテーブル、列コメントの両方から単語を抽出する

3.2.6。実行結果の確認方法

3.2.7。実行時の注意/注意事項

関連記事：

10件のフィードバック

コメントを残す コメントをキャンセル

🔔 カテゴリ

📌 最近の投稿

⭐人気の投稿/コメント/タグ

コメントを残すコメントをキャンセル