tesseract-ocr optical character recognition

Word processors, spreadsheets, presentations, translation, etc.
Message
Author
disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#21 Post by disciple »

There is a py/gtk gui for tesseract at http://groups.google.com/group/ocropus/files/ that is worth looking at. Just find guitesseract.py on that page.
There are a couple of other guis I'm still looking at.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
abushcrafter
Posts: 1418
Joined: Fri 30 Oct 2009, 16:57
Location: England
Contact:

#22 Post by abushcrafter »

That GUI looks promising. Thanks.
[url=http://www.adobe.com/flashplatform/]adobe flash is rubbish![/url]
My Quote:"Humans are stupid, though some are clever but stupid." http://www.dependent.de/media/audio/mp3/System_Syn_Heres_to_You.zip http://www.systemsyn.com/

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

OCRfeeder - gui for OCR

#23 Post by disciple »

There's another py/gtk gui at http://ftp.gnome.org/pub/GNOME/sources/ocrfeeder/0.6/
This one is a bit more capable (e.g. page layout analysis) and looks more like it will be maintained.
You need to comment out one line of code which requires Gnome support, just to display the about page :roll: !

It also uses unpaper, which I posted above, and requires libgoocanvas and pygoocanvas and the python imaging library.
It exports to ODF or html, but unfortunately this isn't working for me; I think my python imaging library may be faulty. If it does work for anyone, please let us know which PIL and which python you're using.
Last edited by disciple on Sun 11 Jul 2010, 02:21, edited 1 time in total.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#24 Post by disciple »

I couldn't find a goocanvas that worked for me, so here's the one I built, and a repackaged py-goocanvas stolen I think from debian.
Attachments
python-pygoocanvas_0.10.0-1_i386.pet
(40.21 KiB) Downloaded 925 times
goocanvas-0.15-i486.pet
(90.9 KiB) Downloaded 927 times
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#25 Post by disciple »

The other gui for tesseract is at http://sourceforge.net/projects/ocrgui/
It is in C/GTK (yay - no python :)) but I suspect is not as capable.
My current puppy doesn't have a new enough GTK to try it, although I think the latest puppies do. You'll also need to install hunspell (or hack it to use enchant instead ;)) and it says imagemagick convert.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
abushcrafter
Posts: 1418
Joined: Fri 30 Oct 2009, 16:57
Location: England
Contact:

#26 Post by abushcrafter »

There is a new version of tesseract out.

Tesseract-GUI
Juan Ramon Castan has improved on the work of Filip Domenic "guitesseract.py". I did not manage to ocr a image with it because the language drop down box had no options.

While on Source Forge I also found another tesseract GUI: http://sourceforge.net/projects/gimagereader/
[url=http://www.adobe.com/flashplatform/]adobe flash is rubbish![/url]
My Quote:"Humans are stupid, though some are clever but stupid." http://www.dependent.de/media/audio/mp3/System_Syn_Heres_to_You.zip http://www.systemsyn.com/

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#27 Post by disciple »

abushcrafter wrote:While on Source Forge I also found another tesseract GUI: http://sourceforge.net/projects/gimagereader/
Another one! Thanks.
Is it really Python/Gnome, or just PyGtk?

If you haven't been following the ocropus thread, you might like to check out cuneiform, which I mentioned there... along with a variety of guis.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
abushcrafter
Posts: 1418
Joined: Fri 30 Oct 2009, 16:57
Location: England
Contact:

#28 Post by abushcrafter »

disciple wrote:
abushcrafter wrote:While on Source Forge I also found another tesseract GUI: http://sourceforge.net/projects/gimagereader/
Another one! Thanks.
Is it really Python/Gnome, or just PyGtk?
I have not tried to yet because I could not face getting and compile any more python bindings and I have a lack of time. It's dependencies are:
  • python
  • pygtk
  • pycairo
  • gnome-python2-gtkspell
  • python-enchant
  • python-imaging
  • pypoppler
  • tesseract (along with it's dictionaries)
  • python-imaging-sane (optional)
So I guess its PyGtk not Gnome.
disciple wrote:If you haven't been following the ocropus thread, you might like to check out cuneiform, which I mentioned there... along with a variety of guis.
No I haven't. Thanks for the pointer.
[url=http://www.adobe.com/flashplatform/]adobe flash is rubbish![/url]
My Quote:"Humans are stupid, though some are clever but stupid." http://www.dependent.de/media/audio/mp3/System_Syn_Heres_to_You.zip http://www.systemsyn.com/

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#29 Post by disciple »

abushcrafter wrote:I have not tried to yet because I could not face getting and compile any more python bindings and I have a lack of time.
I know the feeling ;)
Thanks for the list of dependencies - I couldn't find it for some reason.
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
boxR
Posts: 338
Joined: Sat 13 Aug 2011, 21:58
Location: France

#30 Post by boxR »

And now what is your favorite OCR +GUI? What do you use?

Happy New Year

Dromeno
Posts: 534
Joined: Fri 12 Sep 2008, 07:01

Online OCR option

#31 Post by Dromeno »

I have not tested it yet but it looks convenient. You are only allowed to do 15 pages per hour
http://www.onlineocr.net/

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#32 Post by greengeek »

jrb wrote:I have built ch-tesseract-2.01-OCR-en.sfs, an english version of tesseract. Tesseract_OCR is placed on the right click menu. If you right click on a .tif file it will produce a text file with the same name in a few seconds. However it is very fussy about these .tif files. You may have to open them in mtpaint or another graphics program and resave them. Even the training files required this. After that, however, it seems to work very well.

I have also placed a menu item on the Documents menu which opens a text file with these same instructions.

Packages for other major languages are available and can be easily built.

Let me know how it works for you. J
Hi jrb - do you perchance still have a copy of this sfs? I would like to get basic OCR functional in Slacko 5.6
cheers!

Post Reply