Page 1 of 2

tesseract image requirements

Posted: Wed 14 Oct 2009, 01:40
by ndujoe1
Since tesseract will only operate with uncompressed TIFF files you need just a few extra steps to achieve compatiblity with xsane.

goto : click Preferences --> Setup --> Filetype

for the TIFF options

Set compression rate to 1

in the next three TIFF dialong boxes select no compression.

clock OK

click Preferences again and select SAVE settings.

When scanning a file for OCR in the XSANE menu I select type :TIFF

color : gray
enter 300 for scan resoultion

And save the filename with extention .tif not .tiff.

Then when finished you invoke tesseract from the command line with

tesseract filename.tif outputname

Posted: Tue 12 Jan 2010, 12:42
by disciple
Come on people, why did no one report before now that the package was broken? :oops:
Or did it work in older versions of Puppy? Maybe petget was different...

tesseract

Posted: Tue 12 Jan 2010, 15:04
by ndujoe1
It is not broken I forgot to post that you need to move the tesseract location from /local/tessearct to /usr/local/tesseract. Then you will be able reference it from the command line. It works well on my machine.

Posted: Wed 13 Jan 2010, 08:05
by disciple
Yes, I know the build isn't broken, and neither are your instructions... but my package is.
I obviously packaged it wrong... unless my package somehow got replaced by a different, broken one.

Posted: Wed 13 Jan 2010, 15:55
by zygo
I'm using Puppy 431. I read only the first post in this thread and got it working -- to a fashion -- the command simply returned the dots per pixcel and size of the image file. A 1 byte file was made containing a new line character. No error on the command line. Not even in /log/var/messages . Check for dependencies form the menu lists none.

Now I see ndujoe1 says it needs xsane. Which xsane pet from the official Puppy 4 repo should I use and does that need sane?

Posted: Fri 09 Jul 2010, 13:59
by disciple
There is a py/gtk gui for tesseract at http://groups.google.com/group/ocropus/files/ that is worth looking at. Just find guitesseract.py on that page.
There are a couple of other guis I'm still looking at.

Posted: Fri 09 Jul 2010, 18:38
by abushcrafter
That GUI looks promising. Thanks.

OCRfeeder - gui for OCR

Posted: Sun 11 Jul 2010, 02:02
by disciple
There's another py/gtk gui at http://ftp.gnome.org/pub/GNOME/sources/ocrfeeder/0.6/
This one is a bit more capable (e.g. page layout analysis) and looks more like it will be maintained.
You need to comment out one line of code which requires Gnome support, just to display the about page :roll: !

It also uses unpaper, which I posted above, and requires libgoocanvas and pygoocanvas and the python imaging library.
It exports to ODF or html, but unfortunately this isn't working for me; I think my python imaging library may be faulty. If it does work for anyone, please let us know which PIL and which python you're using.

Posted: Sun 11 Jul 2010, 02:20
by disciple
I couldn't find a goocanvas that worked for me, so here's the one I built, and a repackaged py-goocanvas stolen I think from debian.

Posted: Sun 11 Jul 2010, 02:36
by disciple
The other gui for tesseract is at http://sourceforge.net/projects/ocrgui/
It is in C/GTK (yay - no python :)) but I suspect is not as capable.
My current puppy doesn't have a new enough GTK to try it, although I think the latest puppies do. You'll also need to install hunspell (or hack it to use enchant instead ;)) and it says imagemagick convert.

Posted: Wed 16 Feb 2011, 19:02
by abushcrafter
There is a new version of tesseract out.

Tesseract-GUI
Juan Ramon Castan has improved on the work of Filip Domenic "guitesseract.py". I did not manage to ocr a image with it because the language drop down box had no options.

While on Source Forge I also found another tesseract GUI: http://sourceforge.net/projects/gimagereader/

Posted: Thu 17 Feb 2011, 09:24
by disciple
abushcrafter wrote:While on Source Forge I also found another tesseract GUI: http://sourceforge.net/projects/gimagereader/
Another one! Thanks.
Is it really Python/Gnome, or just PyGtk?

If you haven't been following the ocropus thread, you might like to check out cuneiform, which I mentioned there... along with a variety of guis.

Posted: Fri 18 Feb 2011, 13:22
by abushcrafter
disciple wrote:
abushcrafter wrote:While on Source Forge I also found another tesseract GUI: http://sourceforge.net/projects/gimagereader/
Another one! Thanks.
Is it really Python/Gnome, or just PyGtk?
I have not tried to yet because I could not face getting and compile any more python bindings and I have a lack of time. It's dependencies are:
  • python
  • pygtk
  • pycairo
  • gnome-python2-gtkspell
  • python-enchant
  • python-imaging
  • pypoppler
  • tesseract (along with it's dictionaries)
  • python-imaging-sane (optional)
So I guess its PyGtk not Gnome.
disciple wrote:If you haven't been following the ocropus thread, you might like to check out cuneiform, which I mentioned there... along with a variety of guis.
No I haven't. Thanks for the pointer.

Posted: Fri 18 Feb 2011, 23:38
by disciple
abushcrafter wrote:I have not tried to yet because I could not face getting and compile any more python bindings and I have a lack of time.
I know the feeling ;)
Thanks for the list of dependencies - I couldn't find it for some reason.

Posted: Tue 01 Jan 2013, 23:51
by boxR
And now what is your favorite OCR +GUI? What do you use?

Happy New Year

Online OCR option

Posted: Thu 31 Jan 2013, 11:05
by Dromeno
I have not tested it yet but it looks convenient. You are only allowed to do 15 pages per hour
http://www.onlineocr.net/

Posted: Thu 19 Nov 2015, 08:18
by greengeek
jrb wrote:I have built ch-tesseract-2.01-OCR-en.sfs, an english version of tesseract. Tesseract_OCR is placed on the right click menu. If you right click on a .tif file it will produce a text file with the same name in a few seconds. However it is very fussy about these .tif files. You may have to open them in mtpaint or another graphics program and resave them. Even the training files required this. After that, however, it seems to work very well.

I have also placed a menu item on the Documents menu which opens a text file with these same instructions.

Packages for other major languages are available and can be easily built.

Let me know how it works for you. J
Hi jrb - do you perchance still have a copy of this sfs? I would like to get basic OCR functional in Slacko 5.6
cheers!