Author |
Message |
Geoffrey

Joined: 30 May 2010 Posts: 2379 Location: Queensland
|
Posted: Tue 31 May 2016, 00:29 Post subject:
Tesseract OCR Subject description: Command line OCR |
|
Tesseract OCR compiled in Slacko 6.3.0, command line OCR to read text from images.
tesseract-3.04.01.pet
Dependency
leptonica-1.73-i686.pet
I know very little about OCR, I downloaded the required eng.traineddata which is in this pet, it seems to work with a sample text image ok,
just something to play with, some might find it useful, really needs a GUI frontend to make things easier.
Code: | # tesseract
Usage:
tesseract --help | --help-psm | --version
tesseract --list-langs [--tessdata-dir PATH]
tesseract --print-parameters [options...] [configfile...]
tesseract imagename|stdin outputbase|stdout [options...] [configfile...]
OCR options:
--tessdata-dir PATH Specify the location of tessdata path.
--user-words PATH Specify the location of user words file.
--user-patterns PATH Specify the location of user patterns file.
-l LANG[+LANG] Specify language(s) used for OCR.
-c VAR=VALUE Set value for config variables.
Multiple -c arguments are allowed.
-psm NUM Specify page segmentation mode.
NOTE: These options must occur before any configfile.
Page segmentation modes:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
Single options:
-h, --help Show this help message.
--help-psm Show page segmentation modes.
-v, --version Show version information.
--list-langs List available languages for tesseract engine.
--print-parameters Print tesseract parameters to stdout. |
 |
Description |
|
Filesize |
86.22 KB |
Viewed |
1018 Time(s) |

|
_________________ Carolina: Recent Repository Additions

|
Back to top
|
|
 |
rcrsn51

Joined: 05 Sep 2006 Posts: 13129 Location: Stratford, Ontario
|
Posted: Tue 31 May 2016, 08:01 Post subject:
Re: Tesseract OCR Subject description: Command line OCR |
|
Geoffrey wrote: | really needs a GUI frontend to make things easier. |
There already is one. It's called pic2txt and it's been around for years. If you look at some of the other OCR threads in this section, you will find references to it.
|
Back to top
|
|
 |
Pelo
Joined: 10 Sep 2011 Posts: 12591 Location: Mer méditerrannée (1 kms°)
|
Posted: Tue 31 May 2016, 08:30 Post subject:
Try PuppyOCR. and compare. |
|
Try PuppyOCR. and compare. Don't forget it, Puppy team often changesapplications as Tesseract to get easier job for Puppy's passengers. I used PuppyOcr and kept it not only in my tool case but on the cloud, available for downloads when needed.
A GUI has been added, (in english)
_________________ Passenger Pelo ! don't ask him to repair the aircraft. Don't use him as a demining dog .... pleeease.
|
Back to top
|
|
 |
Geoffrey

Joined: 30 May 2010 Posts: 2379 Location: Queensland
|
Posted: Tue 31 May 2016, 09:29 Post subject:
Re: Tesseract OCR Subject description: Command line OCR |
|
rcrsn51 wrote: | Geoffrey wrote: | really needs a GUI frontend to make things easier. |
There already is one. It's called pic2txt and it's been around for years. If you look at some of the other OCR threads in this section, you will find references to it. |
Yeah I spotted that soon after posting this, I'm in the process of packaging a Qt gui frontend for it, I know, that may seem a little on the heavy side, but it's nice to try something different.
_________________ Carolina: Recent Repository Additions

|
Back to top
|
|
 |
musher0
Joined: 04 Jan 2009 Posts: 15041 Location: Gatineau (Qc), Canada
|
Posted: Tue 31 May 2016, 11:08 Post subject:
|
|
@Geoffrey: Is this (Tesseract) your own work?
_________________ musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)
|
Back to top
|
|
 |
Pelo
Joined: 10 Sep 2011 Posts: 12591 Location: Mer méditerrannée (1 kms°)
|
Posted: Tue 31 May 2016, 12:26 Post subject:
Tesseract est une application multiOS |
|
Tesseract est une application multios qui a été puppisée par PuppyOcr. Sur le forum ubuntu, il a y du courrier à son sujet.
Le logiciel ne fait pas tout et dépend beaucoup du document à océriser. C'est un travail de fourmi, pour des documents qui ont de la valeur.
Ubuntu documentation
Ubuntu review available in English.
_________________ Passenger Pelo ! don't ask him to repair the aircraft. Don't use him as a demining dog .... pleeease.
|
Back to top
|
|
 |
Geoffrey

Joined: 30 May 2010 Posts: 2379 Location: Queensland
|
Posted: Tue 31 May 2016, 18:35 Post subject:
|
|
musher0 wrote: | @Geoffrey: Is this (Tesseract) your own work? |
If you consider compiling it from source code my own work, then ya, both tesseract-3.04.01 and leptonica-1.73-i686 I compiled,
I stripped out the DEV files as that was over 60megs worth, the only thing I added was the eng.traineddata,
downloaded that as a eng.traineddata.gz from somewhere or other.
_________________ Carolina: Recent Repository Additions

|
Back to top
|
|
 |
hamoudoudou
Joined: 24 Jul 2014 Posts: 1467 Location: rabat
|
Posted: Sun 31 Dec 2017, 04:41 Post subject:
Confirmed that PuppyOCR does the job |
|
Confirmed that PuppyOCR does the job without installing Tesseract, or anyting else.. Devs did good job with this little application.
Only French dictionary to be added to launch analysis of wrong words
Feedback artfulpup
Description |
Details |
Filesize |
56.69 KB |
Viewed |
513 Time(s) |

|
|
Back to top
|
|
 |
|