画像処理でTesseract OCRの精度を向上する質問する

Question

fix DPI (if needed) 300 DPI is minimum
fix text size: e.g. 12 pt should be ok for tesseract 3.x (a.k.a as legacy engine) new: best accuracy with tesseract >= 4.x (LSTM engine) is with height of capital letters at 30-33 pixels
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image

There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.

If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.

Answer 1

fix DPI (if needed) 300 DPI is minimum
fix text size: e.g. 12 pt should be ok for tesseract 3.x (a.k.a as legacy engine) new: best accuracy with tesseract >= 4.x (LSTM engine) is with height of capital letters at 30-33 pixels
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image

There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.

If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.

画像処理でTesseract OCRの精度を向上する質問する

ベストアンサー1

おすすめ記事