Tesseract OCR

Word processors, spreadsheets, presentations, translation, etc.
Post Reply
Message
Author
User avatar
Geoffrey
Posts: 2355
Joined: Sun 30 May 2010, 08:42
Location: Queensland

Tesseract OCR

#1 Post by Geoffrey »

Tesseract OCR compiled in Slacko 6.3.0, command line OCR to read text from images.

tesseract-3.04.01.pet

Dependency

leptonica-1.73-i686.pet

I know very little about OCR, I downloaded the required eng.traineddata which is in this pet, it seems to work with a sample text image ok,
just something to play with, some might find it useful, really needs a GUI frontend to make things easier.

Code: Select all

# tesseract
Usage:
  tesseract --help | --help-psm | --version
  tesseract --list-langs [--tessdata-dir PATH]
  tesseract --print-parameters [options...] [configfile...]
  tesseract imagename|stdin outputbase|stdout [options...] [configfile...]

OCR options:
  --tessdata-dir PATH   Specify the location of tessdata path.
  --user-words PATH     Specify the location of user words file.
  --user-patterns PATH  Specify the location of user patterns file.
  -l LANG[+LANG]        Specify language(s) used for OCR.
  -c VAR=VALUE          Set value for config variables.
                        Multiple -c arguments are allowed.
  -psm NUM              Specify page segmentation mode.
NOTE: These options must occur before any configfile.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.

Single options:
  -h, --help            Show this help message.
  --help-psm            Show page segmentation modes.
  -v, --version         Show version information.
  --list-langs          List available languages for tesseract engine.
  --print-parameters    Print tesseract parameters to stdout.
Image
Attachments
Screenshot.png
(86.22 KiB) Downloaded 951 times
[b]Carolina:[/b] [url=http://smokey01.com/carolina/pages/recent-repo.html]Recent Repository Additions[/url]
[img]https://dl.dropboxusercontent.com/s/ahfade8q4def1lq/signbot.gif[/img]

User avatar
rcrsn51
Posts: 13096
Joined: Tue 05 Sep 2006, 13:50
Location: Stratford, Ontario

Re: Tesseract OCR

#2 Post by rcrsn51 »

Geoffrey wrote:really needs a GUI frontend to make things easier.
There already is one. It's called pic2txt and it's been around for years. If you look at some of the other OCR threads in this section, you will find references to it.

Pelo

Try PuppyOCR. and compare.

#3 Post by Pelo »

Try PuppyOCR. and compare. Don't forget it, Puppy team often changesapplications as Tesseract to get easier job for Puppy's passengers. I used PuppyOcr and kept it not only in my tool case but on the cloud, available for downloads when needed.
A GUI has been added, (in english)

User avatar
Geoffrey
Posts: 2355
Joined: Sun 30 May 2010, 08:42
Location: Queensland

Re: Tesseract OCR

#4 Post by Geoffrey »

rcrsn51 wrote:
Geoffrey wrote:really needs a GUI frontend to make things easier.
There already is one. It's called pic2txt and it's been around for years. If you look at some of the other OCR threads in this section, you will find references to it.
Yeah I spotted that soon after posting this, I'm in the process of packaging a Qt gui frontend for it, I know, that may seem a little on the heavy side, but it's nice to try something different.
[b]Carolina:[/b] [url=http://smokey01.com/carolina/pages/recent-repo.html]Recent Repository Additions[/url]
[img]https://dl.dropboxusercontent.com/s/ahfade8q4def1lq/signbot.gif[/img]

musher0
Posts: 14629
Joined: Mon 05 Jan 2009, 00:54
Location: Gatineau (Qc), Canada

#5 Post by musher0 »

@Geoffrey: Is this (Tesseract) your own work?
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)

Pelo

Tesseract est une application multiOS

#6 Post by Pelo »

Tesseract est une application multios qui a été puppisée par PuppyOcr. Sur le forum ubuntu, il a y du courrier à son sujet.
Le logiciel ne fait pas tout et dépend beaucoup du document à océriser. C'est un travail de fourmi, pour des documents qui ont de la valeur.
Ubuntu documentation
Ubuntu review available in English.

User avatar
Geoffrey
Posts: 2355
Joined: Sun 30 May 2010, 08:42
Location: Queensland

#7 Post by Geoffrey »

musher0 wrote:@Geoffrey: Is this (Tesseract) your own work?
If you consider compiling it from source code my own work, then ya, both tesseract-3.04.01 and leptonica-1.73-i686 I compiled,
I stripped out the DEV files as that was over 60megs worth, the only thing I added was the eng.traineddata,
downloaded that as a eng.traineddata.gz from somewhere or other. :wink:
[b]Carolina:[/b] [url=http://smokey01.com/carolina/pages/recent-repo.html]Recent Repository Additions[/url]
[img]https://dl.dropboxusercontent.com/s/ahfade8q4def1lq/signbot.gif[/img]

hamoudoudou

Confirmed that PuppyOCR does the job

#8 Post by hamoudoudou »

Confirmed that PuppyOCR does the job without installing Tesseract, or anyting else.. Devs did good job with this little application.
Only French dictionary to be added to launch analysis of wrong words
Feedback artfulpup
Attachments
PuppyOcr.jpg
Details
(56.69 KiB) Downloaded 463 times

Post Reply