tesseract 3.0

Word processors, spreadsheets, presentations, translation, etc.
Post Reply
Message
Author
User avatar
Dingo
Posts: 1437
Joined: Tue 11 Dec 2007, 17:48
Location: somewhere at the end of rainbow...
Contact:

tesseract 3.0

#1 Post by Dingo »

tesseract 3.0
http://www.dokupuppylinux.info/programs ... sseract-30

for Puppy 3.01 direct download
for Puppy 4.3.1 direct download
for Puppy 5.2.5 direct download

dependencies:

liblept
leptonica-1.68-i486.pet


- additional language data
total 213 MB once unpacked
bul.traineddata cat.traineddata ces.traineddata chi_sim.traineddata chi_tra.traineddata dan-frak.traineddata dan.traineddata deu-frak.traineddata deu.traineddata ell.traineddata eng.traineddata fin.traineddata fra.traineddata heb.traineddata hrv.traineddata hun.traineddata ind.traineddata ita.traineddata jpn.traineddata kor.traineddata lav.traineddata lit.traineddata nld.traineddata nor.traineddata pol.traineddata por.traineddata ron.traineddata rus.traineddata slk-frak.traineddata slv.traineddata spa.traineddata srp.traineddata swe-frak.traineddata swe.traineddata tgl.traineddata tur.traineddata ukr.traineddata vie.traineddata
- as sfs v. 4 (for puppy 4.3.x-5.2.x series) - tesseract3-langdata_431.sfs
- as tar compressed with lzma (.xz) - tesseract3-langdata.tar.xz and type:

Code: Select all

xz -d file.tar.xz
(then move the language files needed in /usr/share/tessdata)

puppy 5.2.5 users can easily extract content of this archive with xarchiver
puppy 4.3.1 users need xz utils

changelog
2010-09-21 - V3.00
* Preparations for thread safety:
* Changed TessBaseAPI methods to be non-static
* Created a class hierarchy for the directories to hold instance data,
and began moving code into the classes.
* Moved thresholding code to a separate class.
* Added major new page layout analysis module.
* Added HOCR output (issues 221, 263: thanks to amkryukov).
* Added Leptonica as main image I/O and handling. Currently optional,
but in future releases linking with Leptonica will be mandatory.
* Ambiguity table rewritten to allow definite replacements in place
of fix_quotes.
* Added TessdataManager to combine data files into a single file.
* Some dead code deleted.
* VC++6 no longer supported. It can't cope with the use of templates.
* Many more languages added.
* Doxygenation of most of the function header comments.
* Added man pages.
* Added bash completion script (issue 247: thanks to neskiem)
* Fix integer overview in thresholding (issue 366: thanks to Cyanide.Drake)
* Add Danish Fraktur support (issues 300, 360: thanks to
dsl602230@vip.cybercity.dk)
* Fix file pointer leak (issue 359, thanks to yukihiro.nakadaira)
* Fix an error using user-words (Issue 345: thanks to max.markin)
* Fix a memory leak in tablefind.cpp (Issue 342, thanks to zdravco)
* Fix a segfault due to double fclose (Issue 320, thanks to souther)
* Fix an automake error (Issue 318, thanks to ichanjz)
* Fix a Win32 crash on fileFormatIsTiff() (Issues 304, 316, 317, 330, 347,
349, 352: thanks to nguyenq87, max.markin, zdenop)
* Fixed a number of errors in newer (stricter) versions of VC++ (Issues
301, among others)
Last edited by Dingo on Sat 17 Nov 2012, 21:39, edited 7 times in total.
replace .co.cc with .info to get access to stuff I posted in forum
dropbox 2GB free
OpenOffice for Puppy Linux

seaside
Posts: 934
Joined: Thu 12 Apr 2007, 00:19

#2 Post by seaside »

Dingo,

Thanks for this. I'm missing " liblept.so.2" in pup431 and in pup425.

Regards,
s

User avatar
Dingo
Posts: 1437
Joined: Tue 11 Dec 2007, 17:48
Location: somewhere at the end of rainbow...
Contact:

#3 Post by Dingo »

Sorry, I forgotten to add dependencies

here you can download

liblept
leptonica-1.68-i486.pet

I compiled tesseract against leptonica libs in order to add support for all available image formats
replace .co.cc with .info to get access to stuff I posted in forum
dropbox 2GB free
OpenOffice for Puppy Linux

seaside
Posts: 934
Joined: Thu 12 Apr 2007, 00:19

#4 Post by seaside »

Dingo,

Thanks. It's working now.

s.

User avatar
Dingo
Posts: 1437
Joined: Tue 11 Dec 2007, 17:48
Location: somewhere at the end of rainbow...
Contact:

#5 Post by Dingo »

added tesseract builds for Puppy 3.01 and Lucid 5.2.5
replace .co.cc with .info to get access to stuff I posted in forum
dropbox 2GB free
OpenOffice for Puppy Linux

Laie
Posts: 318
Joined: Sun 20 Jan 2008, 18:42
Location: Germany

thanks

#6 Post by Laie »

Wow, that's what I've been looking for for a long time! Thanks :-)

bones01
Posts: 371
Joined: Mon 11 Aug 2008, 07:47
Location: Melbourne, Aus

#7 Post by bones01 »

I'm having some trouble getting this to work, and I'm not sure where I've gone wrong.

I've d/l the Lucid version, the lib dependency, and the language pack. I've extracted the english language, so I think I've done everything, but I'm still lost.

I don't have a menu entry for Tesseract anywhere either.

I'm using Lucid 528.004 (frugal) with fluxbox.

Any suggestions would be appreciated.

Bones.
Dell Latitude D630 running Puppy 5.2.8 frugal, Macpup 525 frugal (if I can get it working again. Sadly, I couldn't get it fixed :? )
Precise Puppy 5.4 live DVD
Precise 5.7.3 on USB

User avatar
rcrsn51
Posts: 13096
Joined: Tue 05 Sep 2006, 13:50
Location: Stratford, Ontario

#8 Post by rcrsn51 »

Read here about using Peasyscan with Tesseract.

If you open a terminal and type "tesseract", what happens?

bones01
Posts: 371
Joined: Mon 11 Aug 2008, 07:47
Location: Melbourne, Aus

#9 Post by bones01 »

rcrsn51 wrote:Read here about using Peasyscan with Tesseract.

If you open a terminal and type "tesseract", what happens?
Using LXTerminal, I get this result:
sh-4.1# tesseract
Usage:tesseract imagename outputbase [-l lang] [configfile [[+|-]varfile]...]
sh-4.1#


I'll have a read of peasyscan later, but I should point out that I don't have a scanner attached my puppy computer. The files I have are from another scanner.

Bones
Dell Latitude D630 running Puppy 5.2.8 frugal, Macpup 525 frugal (if I can get it working again. Sadly, I couldn't get it fixed :? )
Precise Puppy 5.4 live DVD
Precise 5.7.3 on USB

User avatar
rcrsn51
Posts: 13096
Joined: Tue 05 Sep 2006, 13:50
Location: Stratford, Ontario

#10 Post by rcrsn51 »

The files I have are from another scanner.
In that case, you must run tesseract from the command line. If the source file is named "scan.tif", you will use the command

Code: Select all

tesseract scan.tif scan
This will produce the file "scan.txt"

Since you have tesseract 3.0, I believe that the input must be in TIFF format. If your files are something else, you will need to run them through a converter like mtpaint.

User avatar
Dingo
Posts: 1437
Joined: Tue 11 Dec 2007, 17:48
Location: somewhere at the end of rainbow...
Contact:

#11 Post by Dingo »

rcrsn51 wrote:Since you have tesseract 3.0, I believe that the input must be in TIFF format. If your files are something else, you will need to run them through a converter like mtpaint.
As far I remember (I compiled tesseract some times ago) leptonica lib gives to tesseract ability to load most common images format
replace .co.cc with .info to get access to stuff I posted in forum
dropbox 2GB free
OpenOffice for Puppy Linux

Post Reply