tesseract-ocr optical character recognition

Word processors, spreadsheets, presentations, translation, etc.
Message
Author
ndujoe1
Posts: 851
Joined: Mon 05 Dec 2005, 01:06

tesseract image requirements

#16 Post by ndujoe1 »

Since tesseract will only operate with uncompressed TIFF files you need just a few extra steps to achieve compatiblity with xsane.

goto : click Preferences --> Setup --> Filetype

for the TIFF options

Set compression rate to 1

in the next three TIFF dialong boxes select no compression.

clock OK

click Preferences again and select SAVE settings.

When scanning a file for OCR in the XSANE menu I select type :TIFF

color : gray
enter 300 for scan resoultion

And save the filename with extention .tif not .tiff.

Then when finished you invoke tesseract from the command line with

tesseract filename.tif outputname

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#17 Post by disciple »

Come on people, why did no one report before now that the package was broken? :oops:
Or did it work in older versions of Puppy? Maybe petget was different...
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

ndujoe1
Posts: 851
Joined: Mon 05 Dec 2005, 01:06

tesseract

#18 Post by ndujoe1 »

It is not broken I forgot to post that you need to move the tesseract location from /local/tessearct to /usr/local/tesseract. Then you will be able reference it from the command line. It works well on my machine.

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#19 Post by disciple »

Yes, I know the build isn't broken, and neither are your instructions... but my package is.
I obviously packaged it wrong... unless my package somehow got replaced by a different, broken one.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

zygo
Posts: 243
Joined: Sat 08 Apr 2006, 20:15
Location: UK

#20 Post by zygo »

I'm using Puppy 431. I read only the first post in this thread and got it working -- to a fashion -- the command simply returned the dots per pixcel and size of the image file. A 1 byte file was made containing a new line character. No error on the command line. Not even in /log/var/messages . Check for dependencies form the menu lists none.

Now I see ndujoe1 says it needs xsane. Which xsane pet from the official Puppy 4 repo should I use and does that need sane?

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#21 Post by disciple »

There is a py/gtk gui for tesseract at http://groups.google.com/group/ocropus/files/ that is worth looking at. Just find guitesseract.py on that page.
There are a couple of other guis I'm still looking at.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
abushcrafter
Posts: 1418
Joined: Fri 30 Oct 2009, 16:57
Location: England
Contact:

#22 Post by abushcrafter »

That GUI looks promising. Thanks.
[url=http://www.adobe.com/flashplatform/]adobe flash is rubbish![/url]
My Quote:"Humans are stupid, though some are clever but stupid." http://www.dependent.de/media/audio/mp3/System_Syn_Heres_to_You.zip http://www.systemsyn.com/

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

OCRfeeder - gui for OCR

#23 Post by disciple »

There's another py/gtk gui at http://ftp.gnome.org/pub/GNOME/sources/ocrfeeder/0.6/
This one is a bit more capable (e.g. page layout analysis) and looks more like it will be maintained.
You need to comment out one line of code which requires Gnome support, just to display the about page :roll: !

It also uses unpaper, which I posted above, and requires libgoocanvas and pygoocanvas and the python imaging library.
It exports to ODF or html, but unfortunately this isn't working for me; I think my python imaging library may be faulty. If it does work for anyone, please let us know which PIL and which python you're using.
Last edited by disciple on Sun 11 Jul 2010, 02:21, edited 1 time in total.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#24 Post by disciple »

I couldn't find a goocanvas that worked for me, so here's the one I built, and a repackaged py-goocanvas stolen I think from debian.
Attachments
python-pygoocanvas_0.10.0-1_i386.pet
(40.21 KiB) Downloaded 926 times
goocanvas-0.15-i486.pet
(90.9 KiB) Downloaded 928 times
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#25 Post by disciple »

The other gui for tesseract is at http://sourceforge.net/projects/ocrgui/
It is in C/GTK (yay - no python :)) but I suspect is not as capable.
My current puppy doesn't have a new enough GTK to try it, although I think the latest puppies do. You'll also need to install hunspell (or hack it to use enchant instead ;)) and it says imagemagick convert.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
abushcrafter
Posts: 1418
Joined: Fri 30 Oct 2009, 16:57
Location: England
Contact:

#26 Post by abushcrafter »

There is a new version of tesseract out.

Tesseract-GUI
Juan Ramon Castan has improved on the work of Filip Domenic "guitesseract.py". I did not manage to ocr a image with it because the language drop down box had no options.

While on Source Forge I also found another tesseract GUI: http://sourceforge.net/projects/gimagereader/
[url=http://www.adobe.com/flashplatform/]adobe flash is rubbish![/url]
My Quote:"Humans are stupid, though some are clever but stupid." http://www.dependent.de/media/audio/mp3/System_Syn_Heres_to_You.zip http://www.systemsyn.com/

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#27 Post by disciple »

abushcrafter wrote:While on Source Forge I also found another tesseract GUI: http://sourceforge.net/projects/gimagereader/
Another one! Thanks.
Is it really Python/Gnome, or just PyGtk?

If you haven't been following the ocropus thread, you might like to check out cuneiform, which I mentioned there... along with a variety of guis.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
abushcrafter
Posts: 1418
Joined: Fri 30 Oct 2009, 16:57
Location: England
Contact:

#28 Post by abushcrafter »

disciple wrote:
abushcrafter wrote:While on Source Forge I also found another tesseract GUI: http://sourceforge.net/projects/gimagereader/
Another one! Thanks.
Is it really Python/Gnome, or just PyGtk?
I have not tried to yet because I could not face getting and compile any more python bindings and I have a lack of time. It's dependencies are:
  • python
  • pygtk
  • pycairo
  • gnome-python2-gtkspell
  • python-enchant
  • python-imaging
  • pypoppler
  • tesseract (along with it's dictionaries)
  • python-imaging-sane (optional)
So I guess its PyGtk not Gnome.
disciple wrote:If you haven't been following the ocropus thread, you might like to check out cuneiform, which I mentioned there... along with a variety of guis.
No I haven't. Thanks for the pointer.
[url=http://www.adobe.com/flashplatform/]adobe flash is rubbish![/url]
My Quote:"Humans are stupid, though some are clever but stupid." http://www.dependent.de/media/audio/mp3/System_Syn_Heres_to_You.zip http://www.systemsyn.com/

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#29 Post by disciple »

abushcrafter wrote:I have not tried to yet because I could not face getting and compile any more python bindings and I have a lack of time.
I know the feeling ;)
Thanks for the list of dependencies - I couldn't find it for some reason.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
boxR
Posts: 338
Joined: Sat 13 Aug 2011, 21:58
Location: France

#30 Post by boxR »

And now what is your favorite OCR +GUI? What do you use?

Happy New Year

Dromeno
Posts: 534
Joined: Fri 12 Sep 2008, 07:01

Online OCR option

#31 Post by Dromeno »

I have not tested it yet but it looks convenient. You are only allowed to do 15 pages per hour
http://www.onlineocr.net/

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#32 Post by greengeek »

jrb wrote:I have built ch-tesseract-2.01-OCR-en.sfs, an english version of tesseract. Tesseract_OCR is placed on the right click menu. If you right click on a .tif file it will produce a text file with the same name in a few seconds. However it is very fussy about these .tif files. You may have to open them in mtpaint or another graphics program and resave them. Even the training files required this. After that, however, it seems to work very well.

I have also placed a menu item on the Documents menu which opens a text file with these same instructions.

Packages for other major languages are available and can be easily built.

Let me know how it works for you. J
Hi jrb - do you perchance still have a copy of this sfs? I would like to get basic OCR functional in Slacko 5.6
cheers!

Post Reply