Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Mon 05 Dec 2016, 04:55
All times are UTC - 4
 Forum index » Advanced Topics » Additional Software (PETs, n' stuff) » Documents
Tesseract OCR
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 1 [7 Posts]  
Author Message
Geoffrey


Joined: 30 May 2010
Posts: 2166
Location: Queensland

PostPosted: Tue 31 May 2016, 00:29    Post subject:  Tesseract OCR
Subject description: Command line OCR
 

Tesseract OCR compiled in Slacko 6.3.0, command line OCR to read text from images.

tesseract-3.04.01.pet

Dependency

leptonica-1.73-i686.pet

I know very little about OCR, I downloaded the required eng.traineddata which is in this pet, it seems to work with a sample text image ok,
just something to play with, some might find it useful, really needs a GUI frontend to make things easier.

Code:
# tesseract
Usage:
  tesseract --help | --help-psm | --version
  tesseract --list-langs [--tessdata-dir PATH]
  tesseract --print-parameters [options...] [configfile...]
  tesseract imagename|stdin outputbase|stdout [options...] [configfile...]

OCR options:
  --tessdata-dir PATH   Specify the location of tessdata path.
  --user-words PATH     Specify the location of user words file.
  --user-patterns PATH  Specify the location of user patterns file.
  -l LANG[+LANG]        Specify language(s) used for OCR.
  -c VAR=VALUE          Set value for config variables.
                        Multiple -c arguments are allowed.
  -psm NUM              Specify page segmentation mode.
NOTE: These options must occur before any configfile.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.

Single options:
  -h, --help            Show this help message.
  --help-psm            Show page segmentation modes.
  -v, --version         Show version information.
  --list-langs          List available languages for tesseract engine.
  --print-parameters    Print tesseract parameters to stdout.

Screenshot.png
 Description   
 Filesize   86.22 KB
 Viewed   397 Time(s)

Screenshot.png


_________________
Carolina: Recent Repository Additions
Create strikethrough text HERE
My Hovercraft Is Full of Eels
Back to top
View user's profile Send private message 
rcrsn51


Joined: 05 Sep 2006
Posts: 10984
Location: Stratford, Ontario

PostPosted: Tue 31 May 2016, 08:01    Post subject: Re: Tesseract OCR
Subject description: Command line OCR
 

Geoffrey wrote:
really needs a GUI frontend to make things easier.

There already is one. It's called pic2txt and it's been around for years. If you look at some of the other OCR threads in this section, you will find references to it.
Back to top
View user's profile Send private message 
Pelo


Joined: 10 Sep 2011
Posts: 10093
Location: Mer méditerrannée (1 kms°)

PostPosted: Tue 31 May 2016, 08:30    Post subject: Try PuppyOCR. and compare.  

Try PuppyOCR. and compare. Don't forget it, Puppy team often changesapplications as Tesseract to get easier job for Puppy's passengers. I used PuppyOcr and kept it not only in my tool case but on the cloud, available for downloads when needed.
A GUI has been added, (in english)

_________________
November : protect your Puppies of frosts (to freeze, Confused froze, frozen)
Back to top
View user's profile Send private message Yahoo Messenger 
Geoffrey


Joined: 30 May 2010
Posts: 2166
Location: Queensland

PostPosted: Tue 31 May 2016, 09:29    Post subject: Re: Tesseract OCR
Subject description: Command line OCR
 

rcrsn51 wrote:
Geoffrey wrote:
really needs a GUI frontend to make things easier.

There already is one. It's called pic2txt and it's been around for years. If you look at some of the other OCR threads in this section, you will find references to it.

Yeah I spotted that soon after posting this, I'm in the process of packaging a Qt gui frontend for it, I know, that may seem a little on the heavy side, but it's nice to try something different.

_________________
Carolina: Recent Repository Additions
Create strikethrough text HERE
My Hovercraft Is Full of Eels
Back to top
View user's profile Send private message 
musher0


Joined: 04 Jan 2009
Posts: 8905
Location: Gatineau (Qc), Canada

PostPosted: Tue 31 May 2016, 11:08    Post subject:  

@Geoffrey: Is this (Tesseract) your own work?
_________________
musher0
~~~~~~~~~~
"The greatest of minds are the ones that never close." | "Les plus grands esprits sont ceux qui ne se ferment jamais."
(starhawk, Resident Philosopher | philosophe en résidence) Wink
Back to top
View user's profile Send private message 
Pelo


Joined: 10 Sep 2011
Posts: 10093
Location: Mer méditerrannée (1 kms°)

PostPosted: Tue 31 May 2016, 12:26    Post subject: Tesseract est une application multiOS  

Tesseract est une application multios qui a été puppisée par PuppyOcr. Sur le forum ubuntu, il a y du courrier à son sujet.
Le logiciel ne fait pas tout et dépend beaucoup du document à océriser. C'est un travail de fourmi, pour des documents qui ont de la valeur.
Ubuntu documentation
Ubuntu review available in English.

_________________
November : protect your Puppies of frosts (to freeze, Confused froze, frozen)
Back to top
View user's profile Send private message Yahoo Messenger 
Geoffrey


Joined: 30 May 2010
Posts: 2166
Location: Queensland

PostPosted: Tue 31 May 2016, 18:35    Post subject:  

musher0 wrote:
@Geoffrey: Is this (Tesseract) your own work?

If you consider compiling it from source code my own work, then ya, both tesseract-3.04.01 and leptonica-1.73-i686 I compiled,
I stripped out the DEV files as that was over 60megs worth, the only thing I added was the eng.traineddata,
downloaded that as a eng.traineddata.gz from somewhere or other. Wink

_________________
Carolina: Recent Repository Additions
Create strikethrough text HERE
My Hovercraft Is Full of Eels
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 1 [7 Posts]  
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Advanced Topics » Additional Software (PETs, n' stuff) » Documents
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0367s ][ Queries: 12 (0.0034s) ][ GZIP on ]