Character recognizing software

Booting, installing, newbie
Post Reply
Message
Author
snayak
Posts: 422
Joined: Wed 14 Sep 2011, 05:49

Character recognizing software

#1 Post by snayak »

Hi All,

I am looking for a software which can extract texts from an image. Basically, input is a scanned image and output is texts in it. Example could be, I will scan a page from a book, I will give the jpg/png image to the software and it will give me the texts written on it. Is there such a software on puppy to try?

Sincerely,
Srinivas Nayak
[Precise 571 on AMD Athlon XP 2000+ with 512MB RAM]
[Fatdog 720 on Intel Pentium B960 with 4GB RAM]

[url]http://srinivas-nayak.blogspot.com/[/url]

User avatar
Burn_IT
Posts: 3650
Joined: Sat 12 Aug 2006, 19:25
Location: Tamworth UK

#2 Post by Burn_IT »

Most OCR software will do this and has done for many years.
"Just think of it as leaving early to avoid the rush" - T Pratchett

musher0
Posts: 14629
Joined: Mon 05 Jan 2009, 00:54
Location: Gatineau (Qc), Canada

#3 Post by musher0 »

Hi snayak.

I think no software can recognize my character. :lol:
(Too tempting, couldn't help it!) ;)

BFN.
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)

User avatar
Burn_IT
Posts: 3650
Joined: Sat 12 Aug 2006, 19:25
Location: Tamworth UK

#4 Post by Burn_IT »

You don't need software if they are mugshots!!!

:roll:
"Just think of it as leaving early to avoid the rush" - T Pratchett

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

Re: Character recognizing software

#5 Post by greengeek »

snayak wrote: I will scan a page from a book, I will give the jpg/png image to the software and it will give me the texts written on it. Is there such a software on puppy to try?
Hi snayak, the software you require is called "OCR" software - "Optical Character Recognition".

I have tried several OCR programmes in Puppy but the one I had most success with is called Tesseract.

Forum member rcrsn51 also released a utility called pic2txt which works with Tesseract,

Pic2txt (which is placed in the Graphics menu) was a component of peasy pdf (which is in the document menu). (EDIT : probably a component of "peasyscan" not peasy pdf - but I also use peasypdf to extract text from some pdfs so maybe that can also do some of what you require)

It is critical to ensure that the scan uses the best resolution - so trial and error is needed to find the settings that work best on your equipment. Also - the image sometimes needs scaling in order for pic2txt to best analyse the characters.

Successful OCR is a combination of Art and Science. Patience required to find the most reliable setup parameters.

I will post back if I can find the appropriate peasypdf, pic2txt and Tesseract threads.

EDIT : start with this post and the ones following it:
http://murga-linux.com/puppy/viewtopic. ... 756#462756

Post Reply