他のスクリプトがすべてのファイルを順番に処理できるようにするスクリプト？

2024-06-28 • tag-icon

shell-script

ocrを使ってpdfファイルをtxt形式に変換する素晴らしいスクリプトを見つけました。

ただし、一度に1つのPDFファイルのみを変換します。大規模に変換する必要があります。

私はシナリオを書くことについて何も知りませんでした。スクリプトは次のとおりです。

どうすれば一括変換できますか？

#!/bin/bash

## script to:
##   *  split a PDF up by pages
##   *  convert them to an image format
##   *  read the text from each page
##   *  concatenate the pages


## pass name of PDF file to script
INFILE=$1

## split PDF file into pages, resulting files will be
## numbered: pg_0001.pdf  pg_0002.pdf  pg_0003.pdf
pdftk $INFILE burst

for i in pg*.pdf ; do

    ## convert it to a PNG image file
    convert -density 200 -quality 100 $i ${i%.pdf}.png

    ## read text from each page
    tesseract ${i%.pdf}.png ${i%.pdf}.txt

done

## concatenate the pages into a single text file
cat pg*.txt > ${INFILE%.pdf}.txt

exit

注：同様の質問を読んだが理解できません。

ベストアンサー1

スクリプトを変更できます。

# instead of INFILE=$1
for INFILE
do
#...

    for i in pg*.pdf ; do
        #...    
    done

    ## concatenate the pages into a single text file
    cat pg*.txt > ${INFILE%.pdf}.txt
done

次に、次のようにスクリプトを呼び出します。

some-script.sh 1.pdf 2.pdf #...

繰り返す内容がない場合、ループはbash forすべてのコマンドライン引数を繰り返します。したがって、

for INFILE

以下と同じ：

for INFILE in "$@"

ベストアンサー1

おすすめ記事