How to get OCR software to work?

Message

fixit · #1 Post by **fixit** » Fri 17 Oct 2014, 23:59

I would like scans from a book converted to text.

I installed PeasyScan and Tesseract.

Pic2Txt says Tesseract is not installed. It is.

PeasyScan says no scanner is installed, yet scangearmp scans just fine.

I don't know what to check next.

rcrsn51 · #2 Post by **rcrsn51** » Sat 18 Oct 2014, 12:28

fixit wrote:PeasyScan says no scanner is installed, yet scangearmp scans just fine.

Your Canon printer is not a SANE-compatible model so it only works with the Canon scangearmp program.

Pic2Txt says Tesseract is not installed. It is.

Go to a command line and type: tesseract

Where did you get the tesseract package?

I installed PeasyScan

What Puppy version are you using? Most recent ones already have Peasyscan.

fixit · #3 Post by **fixit** » Sun 19 Oct 2014, 21:43

rcrsn51 wrote:
fixit wrote:PeasyScan says no scanner is installed, yet scangearmp scans just fine.
Your Canon printer is not a SANE-compatible model so it only works with the Canon scangearmp program.

Pic2Txt says Tesseract is not installed. It is.
Go to a command line and type: tesseract

Where did you get the tesseract package?

I installed PeasyScan
What Puppy version are you using? Most recent ones already have Peasyscan.

My primary printer is a Brother HL-2240.

I used the tesseract from here.

http://murga-linux.com/puppy/viewtopic.php?t=51507

Puppy 5.6.0

rcrsn51 · #4 Post by **rcrsn51** » Mon 20 Oct 2014, 00:01

I gave you one crucial test to perform but you failed to report the result. So there is nothing else I can do for you.

fixit · #5 Post by **fixit** » Mon 20 Oct 2014, 01:31

# tesseract
tesseract:Error:Usage:tesseract imagename outputbase [-l lang] [configfile [[+|-]varfile]...]

#

Galbi · #6 Post by **Galbi** » Mon 20 Oct 2014, 03:48

fixit wrote:# tesseract
tesseract:Error:Usage:tesseract imagename outputbase [-l lang] [configfile [[+|-]varfile]...]

#

It seems that tesseract it's a command line app.
I think that you have to tell tesseract the name of a file/s that you have previously scanned.

Quoted from tesseract ReadMe

Running Tesseract

Tesseract is a command-line program, so first open a terminal or command prompt. The command is used like this:

tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

So basic usage to do OCR on an image called 'myscan.png' and save the result to 'out.txt' would be:

tesseract myscan.png out

C'mon Andy... put half a battery.

fixit · #7 Post by **fixit** » Mon 20 Oct 2014, 05:36

Guess a lot a folks are still mad at me.

rcrsn51 asked me for the answer to that command line.

I am trying to get a scanned image converted to printable text.

My goal is to just to use only Linux as an O.S.

So far, Puppy has done a great job.

tesseract -man yielded this.

http://code.google.com/p/tesseract-ocr/

Code: Select all

Linux

Tesseract is available directly from many Linux distributions. The package is generally called 'tesseract' or 'tesseract-ocr' - search your distribution's repositories to find it. Packages are also generally available for language training data (search the repositories,) but if not you will need to download the appropriate training data, unpack it, and copy the .traineddata file into the 'tessdata' directory, probably /usr/share/tesseract-ocr/tessdata or /usr/share/tessdata.

If Tesseract isn't available for your distribution, or you want to use a newer version than they offer, you can compile your own. Note that older versions of Tesseract only supported processing .tiff files.

It doesn't seem to be user friendly.

rcrsn51 · #8 Post by **rcrsn51** » Mon 20 Oct 2014, 11:18

I installed Tesseract v3.00 from here. and ran it with pic2txt. It worked fine.

fixit · #9 Post by **fixit** » Tue 21 Oct 2014, 01:09

Thanks.

It's working, but the conversion is pretty unusable.

I know the difficulties involved in converting pictures to text.

rcrsn51 · #10 Post by **rcrsn51** » Tue 21 Oct 2014, 01:12

In my tests, you need at least 300 DPI resolution in the scanned image for OCR to work properly.

Dingo · #11 Post by **Dingo** » Tue 21 Oct 2014, 01:21

Personally speaking, I found that the Free ocr features provided by tracker's X-Change pdf viewer

http://www.tracker-software.com/product/downloads
http://www.tracker-software.com/pdf-xchange-viewer-ocr (ocr modules)

are very valuable, especially for me, since I'm used to build a pdf from scans and then submit this pdf to ocr softwares in order to have a searchable pdf with hidden text layer

it works very well with wine

you need to install first the X-Change pdf viewer, then the ocr modules for languages. once installed, you can look into your /root/.wine folder, and move the tracker folder to another location if you don't want to consume too much space in puppy savefile

fixit · #12 Post by **fixit** » Tue 21 Oct 2014, 05:17

rcrsn51 wrote:In my tests, you need at least 300 DPI resolution in the scanned image for OCR to work properly.

Thanks.

When I scanned at 400 dpi, the conversation was pretty accurate.

MochiMoppel · #13 Post by **MochiMoppel** » Tue 21 Oct 2014, 07:23

rcrsn51 wrote:I installed Tesseract v3.00 from here. and ran it with pic2txt. It worked fine.

It does, but it doesn't work with the linked language files. Files in http://code.google.com/p/tesseract-ocr/downloads/list are all for version 3.02, could this be the reason? Even the new eng.traineddata (20.8M) doesn't work. The final "Text saved" message appears, but no text file is created

Any chance to get v3.00 language files anywhere?

rcrsn51 · #14 Post by **rcrsn51** » Tue 21 Oct 2014, 11:40

MochiMoppel wrote:Any chance to get v3.00 language files anywhere?

From here:

You may need to view the second page to find a 3.00 version for your language.

I have also built Tesseract v3.02 but the PET is larger and does NOT contain a language file. However, I once posted v3.01 and the download link eventually went dead from lack of use.

If anyone wants to test v3.02, they can send me a PM.

MochiMoppel · #15 Post by **MochiMoppel** » Wed 22 Oct 2014, 02:28

Oops...thanks!

(old)Puppy Linux Discussion Forum

(old)Puppy Linux Discussion Forum

How to get OCR software to work?

How to get OCR software to work?

Re: OCR software

Re: OCR software