Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Sun 26 Oct 2014, 04:38
All times are UTC - 4
 Forum index » Advanced Topics » Additional Software (PETs, n' stuff) » Documents
tesseract-ocr optical character recognition
Post_new_topic   Reply_to_topic View_previous_topic :: View_next_topic
Page 2 of 3 Posts_count   Goto page: Previous 1, 2, 3 Next
Author Message
ndujoe1

Joined: 04 Dec 2005
Posts: 700

PostPosted: Tue 13 Oct 2009, 21:40    Post_subject: tesseract image requirements  

Since tesseract will only operate with uncompressed TIFF files you need just a few extra steps to achieve compatiblity with xsane.

goto : click Preferences --> Setup --> Filetype

for the TIFF options

Set compression rate to 1

in the next three TIFF dialong boxes select no compression.

clock OK

click Preferences again and select SAVE settings.

When scanning a file for OCR in the XSANE menu I select type :TIFF

color : gray
enter 300 for scan resoultion

And save the filename with extention .tif not .tiff.

Then when finished you invoke tesseract from the command line with

tesseract filename.tif outputname
Back to top
View user's profile Send_private_message 
disciple

Joined: 20 May 2006
Posts: 6449
Location: Auckland, New Zealand

PostPosted: Tue 12 Jan 2010, 08:42    Post_subject:  

Come on people, why did no one report before now that the package was broken? Embarassed
Or did it work in older versions of Puppy? Maybe petget was different...

_________________
DEATH TO SPREADSHEETS
- - -
Classic Puppy quotes
- - -
Beware the demented serfers!
Back to top
View user's profile Send_private_message 
ndujoe1

Joined: 04 Dec 2005
Posts: 700

PostPosted: Tue 12 Jan 2010, 11:04    Post_subject: tesseract  

It is not broken I forgot to post that you need to move the tesseract location from /local/tessearct to /usr/local/tesseract. Then you will be able reference it from the command line. It works well on my machine.
Back to top
View user's profile Send_private_message 
disciple

Joined: 20 May 2006
Posts: 6449
Location: Auckland, New Zealand

PostPosted: Wed 13 Jan 2010, 04:05    Post_subject:  

Yes, I know the build isn't broken, and neither are your instructions... but my package is.
I obviously packaged it wrong... unless my package somehow got replaced by a different, broken one.

_________________
DEATH TO SPREADSHEETS
- - -
Classic Puppy quotes
- - -
Beware the demented serfers!
Back to top
View user's profile Send_private_message 
zygo

Joined: 08 Apr 2006
Posts: 211
Location: UK

PostPosted: Wed 13 Jan 2010, 11:55    Post_subject:  

I'm using Puppy 431. I read only the first post in this thread and got it working -- to a fashion -- the command simply returned the dots per pixcel and size of the image file. A 1 byte file was made containing a new line character. No error on the command line. Not even in /log/var/messages . Check for dependencies form the menu lists none.

Now I see ndujoe1 says it needs xsane. Which xsane pet from the official Puppy 4 repo should I use and does that need sane?
Back to top
View user's profile Send_private_message 
disciple

Joined: 20 May 2006
Posts: 6449
Location: Auckland, New Zealand

PostPosted: Fri 09 Jul 2010, 09:59    Post_subject:  

There is a py/gtk gui for tesseract at http://groups.google.com/group/ocropus/files/ that is worth looking at. Just find guitesseract.py on that page.
There are a couple of other guis I'm still looking at.

_________________
DEATH TO SPREADSHEETS
- - -
Classic Puppy quotes
- - -
Beware the demented serfers!
Back to top
View user's profile Send_private_message 
abushcrafter


Joined: 30 Oct 2009
Posts: 1447
Location: England

PostPosted: Fri 09 Jul 2010, 14:38    Post_subject:  

That GUI looks promising. Thanks.
_________________
adobe flash is rubbish!
My Quote:"Humans are stupid, though some are clever but stupid." http://www.dependent.de/media/audio/mp3/System_Syn_Heres_to_You.zip http://www.systemsyn.com/
Back to top
View user's profile Send_private_message Visit_website 
disciple

Joined: 20 May 2006
Posts: 6449
Location: Auckland, New Zealand

PostPosted: Sat 10 Jul 2010, 22:02    Post_subject: OCRfeeder - gui for OCR  

There's another py/gtk gui at http://ftp.gnome.org/pub/GNOME/sources/ocrfeeder/0.6/
This one is a bit more capable (e.g. page layout analysis) and looks more like it will be maintained.
You need to comment out one line of code which requires Gnome support, just to display the about page Rolling Eyes !

It also uses unpaper, which I posted above, and requires libgoocanvas and pygoocanvas and the python imaging library.
It exports to ODF or html, but unfortunately this isn't working for me; I think my python imaging library may be faulty. If it does work for anyone, please let us know which PIL and which python you're using.

_________________
DEATH TO SPREADSHEETS
- - -
Classic Puppy quotes
- - -
Beware the demented serfers!

Edited_time_total
Back to top
View user's profile Send_private_message 
disciple

Joined: 20 May 2006
Posts: 6449
Location: Auckland, New Zealand

PostPosted: Sat 10 Jul 2010, 22:20    Post_subject:  

I couldn't find a goocanvas that worked for me, so here's the one I built, and a repackaged py-goocanvas stolen I think from debian.
python-pygoocanvas_0.10.0-1_i386.pet
Description 
pet

 Download 
Filename  python-pygoocanvas_0.10.0-1_i386.pet 
Filesize  40.21 KB 
Downloaded  644 Time(s) 
goocanvas_DEV-0.15-i486.pet
Description 
pet

 Download 
Filename  goocanvas_DEV-0.15-i486.pet 
Filesize  515.15 KB 
Downloaded  677 Time(s) 
goocanvas-0.15-i486.pet
Description 
pet

 Download 
Filename  goocanvas-0.15-i486.pet 
Filesize  90.9 KB 
Downloaded  635 Time(s) 

_________________
DEATH TO SPREADSHEETS
- - -
Classic Puppy quotes
- - -
Beware the demented serfers!
Back to top
View user's profile Send_private_message 
disciple

Joined: 20 May 2006
Posts: 6449
Location: Auckland, New Zealand

PostPosted: Sat 10 Jul 2010, 22:36    Post_subject:  

The other gui for tesseract is at http://sourceforge.net/projects/ocrgui/
It is in C/GTK (yay - no python Smile) but I suspect is not as capable.
My current puppy doesn't have a new enough GTK to try it, although I think the latest puppies do. You'll also need to install hunspell (or hack it to use enchant instead Wink) and it says imagemagick convert.

_________________
DEATH TO SPREADSHEETS
- - -
Classic Puppy quotes
- - -
Beware the demented serfers!
Back to top
View user's profile Send_private_message 
abushcrafter


Joined: 30 Oct 2009
Posts: 1447
Location: England

PostPosted: Wed 16 Feb 2011, 15:02    Post_subject:  

There is a new version of tesseract out.

Tesseract-GUI
Juan Ramon Castan has improved on the work of Filip Domenic "guitesseract.py". I did not manage to ocr a image with it because the language drop down box had no options.

While on Source Forge I also found another tesseract GUI: http://sourceforge.net/projects/gimagereader/

_________________
adobe flash is rubbish!
My Quote:"Humans are stupid, though some are clever but stupid." http://www.dependent.de/media/audio/mp3/System_Syn_Heres_to_You.zip http://www.systemsyn.com/
Back to top
View user's profile Send_private_message Visit_website 
disciple

Joined: 20 May 2006
Posts: 6449
Location: Auckland, New Zealand

PostPosted: Thu 17 Feb 2011, 05:24    Post_subject:  

abushcrafter wrote:
While on Source Forge I also found another tesseract GUI: http://sourceforge.net/projects/gimagereader/

Another one! Thanks.
Is it really Python/Gnome, or just PyGtk?

If you haven't been following the ocropus thread, you might like to check out cuneiform, which I mentioned there... along with a variety of guis.

_________________
DEATH TO SPREADSHEETS
- - -
Classic Puppy quotes
- - -
Beware the demented serfers!
Back to top
View user's profile Send_private_message 
abushcrafter


Joined: 30 Oct 2009
Posts: 1447
Location: England

PostPosted: Fri 18 Feb 2011, 09:22    Post_subject:  

disciple wrote:
abushcrafter wrote:
While on Source Forge I also found another tesseract GUI: http://sourceforge.net/projects/gimagereader/

Another one! Thanks.
Is it really Python/Gnome, or just PyGtk?
I have not tried to yet because I could not face getting and compile any more python bindings and I have a lack of time. It's dependencies are:
  • python
  • pygtk
  • pycairo
  • gnome-python2-gtkspell
  • python-enchant
  • python-imaging
  • pypoppler
  • tesseract (along with it's dictionaries)
  • python-imaging-sane (optional)

So I guess its PyGtk not Gnome.

disciple wrote:
If you haven't been following the ocropus thread, you might like to check out cuneiform, which I mentioned there... along with a variety of guis.
No I haven't. Thanks for the pointer.
_________________
adobe flash is rubbish!
My Quote:"Humans are stupid, though some are clever but stupid." http://www.dependent.de/media/audio/mp3/System_Syn_Heres_to_You.zip http://www.systemsyn.com/
Back to top
View user's profile Send_private_message Visit_website 
disciple

Joined: 20 May 2006
Posts: 6449
Location: Auckland, New Zealand

PostPosted: Fri 18 Feb 2011, 19:38    Post_subject:  

abushcrafter wrote:
I have not tried to yet because I could not face getting and compile any more python bindings and I have a lack of time.

I know the feeling Wink
Thanks for the list of dependencies - I couldn't find it for some reason.

_________________
DEATH TO SPREADSHEETS
- - -
Classic Puppy quotes
- - -
Beware the demented serfers!
Back to top
View user's profile Send_private_message 
boxR


Joined: 13 Aug 2011
Posts: 247
Location: France

PostPosted: Tue 01 Jan 2013, 19:51    Post_subject:  

And now what is your favorite OCR +GUI? What do you use?

Happy New Year
Back to top
View user's profile Send_private_message 
Display_posts:   Sort by:   
Page 2 of 3 Posts_count   Goto page: Previous 1, 2, 3 Next
Post_new_topic   Reply_to_topic View_previous_topic :: View_next_topic
 Forum index » Advanced Topics » Additional Software (PETs, n' stuff) » Documents
Jump to:  

Rules_post_cannot
Rules_reply_cannot
Rules_edit_cannot
Rules_delete_cannot
Rules_vote_cannot
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0770s ][ Queries: 13 (0.0071s) ][ GZIP on ]