Page 1 of 1

tesseract 3.0

Posted: Fri 19 Aug 2011, 15:46
by Dingo
tesseract 3.0
http://www.dokupuppylinux.info/programs ... sseract-30

for Puppy 3.01 direct download
for Puppy 4.3.1 direct download
for Puppy 5.2.5 direct download

dependencies:

liblept
leptonica-1.68-i486.pet


- additional language data
total 213 MB once unpacked
bul.traineddata cat.traineddata ces.traineddata chi_sim.traineddata chi_tra.traineddata dan-frak.traineddata dan.traineddata deu-frak.traineddata deu.traineddata ell.traineddata eng.traineddata fin.traineddata fra.traineddata heb.traineddata hrv.traineddata hun.traineddata ind.traineddata ita.traineddata jpn.traineddata kor.traineddata lav.traineddata lit.traineddata nld.traineddata nor.traineddata pol.traineddata por.traineddata ron.traineddata rus.traineddata slk-frak.traineddata slv.traineddata spa.traineddata srp.traineddata swe-frak.traineddata swe.traineddata tgl.traineddata tur.traineddata ukr.traineddata vie.traineddata
- as sfs v. 4 (for puppy 4.3.x-5.2.x series) - tesseract3-langdata_431.sfs
- as tar compressed with lzma (.xz) - tesseract3-langdata.tar.xz and type:

Code: Select all

xz -d file.tar.xz
(then move the language files needed in /usr/share/tessdata)

puppy 5.2.5 users can easily extract content of this archive with xarchiver
puppy 4.3.1 users need xz utils

changelog
2010-09-21 - V3.00
* Preparations for thread safety:
* Changed TessBaseAPI methods to be non-static
* Created a class hierarchy for the directories to hold instance data,
and began moving code into the classes.
* Moved thresholding code to a separate class.
* Added major new page layout analysis module.
* Added HOCR output (issues 221, 263: thanks to amkryukov).
* Added Leptonica as main image I/O and handling. Currently optional,
but in future releases linking with Leptonica will be mandatory.
* Ambiguity table rewritten to allow definite replacements in place
of fix_quotes.
* Added TessdataManager to combine data files into a single file.
* Some dead code deleted.
* VC++6 no longer supported. It can't cope with the use of templates.
* Many more languages added.
* Doxygenation of most of the function header comments.
* Added man pages.
* Added bash completion script (issue 247: thanks to neskiem)
* Fix integer overview in thresholding (issue 366: thanks to Cyanide.Drake)
* Add Danish Fraktur support (issues 300, 360: thanks to
dsl602230@vip.cybercity.dk)
* Fix file pointer leak (issue 359, thanks to yukihiro.nakadaira)
* Fix an error using user-words (Issue 345: thanks to max.markin)
* Fix a memory leak in tablefind.cpp (Issue 342, thanks to zdravco)
* Fix a segfault due to double fclose (Issue 320, thanks to souther)
* Fix an automake error (Issue 318, thanks to ichanjz)
* Fix a Win32 crash on fileFormatIsTiff() (Issues 304, 316, 317, 330, 347,
349, 352: thanks to nguyenq87, max.markin, zdenop)
* Fixed a number of errors in newer (stricter) versions of VC++ (Issues
301, among others)

Posted: Fri 19 Aug 2011, 20:35
by seaside
Dingo,

Thanks for this. I'm missing " liblept.so.2" in pup431 and in pup425.

Regards,
s

Posted: Fri 19 Aug 2011, 20:58
by Dingo
Sorry, I forgotten to add dependencies

here you can download

liblept
leptonica-1.68-i486.pet

I compiled tesseract against leptonica libs in order to add support for all available image formats

Posted: Fri 19 Aug 2011, 21:31
by seaside
Dingo,

Thanks. It's working now.

s.

Posted: Sun 21 Aug 2011, 14:11
by Dingo
added tesseract builds for Puppy 3.01 and Lucid 5.2.5

thanks

Posted: Tue 30 Aug 2011, 20:49
by Laie
Wow, that's what I've been looking for for a long time! Thanks :-)

Posted: Wed 09 May 2012, 04:02
by bones01
I'm having some trouble getting this to work, and I'm not sure where I've gone wrong.

I've d/l the Lucid version, the lib dependency, and the language pack. I've extracted the english language, so I think I've done everything, but I'm still lost.

I don't have a menu entry for Tesseract anywhere either.

I'm using Lucid 528.004 (frugal) with fluxbox.

Any suggestions would be appreciated.

Bones.

Posted: Wed 09 May 2012, 04:20
by rcrsn51
Read here about using Peasyscan with Tesseract.

If you open a terminal and type "tesseract", what happens?

Posted: Thu 10 May 2012, 01:00
by bones01
rcrsn51 wrote:Read here about using Peasyscan with Tesseract.

If you open a terminal and type "tesseract", what happens?
Using LXTerminal, I get this result:
sh-4.1# tesseract
Usage:tesseract imagename outputbase [-l lang] [configfile [[+|-]varfile]...]
sh-4.1#


I'll have a read of peasyscan later, but I should point out that I don't have a scanner attached my puppy computer. The files I have are from another scanner.

Bones

Posted: Thu 10 May 2012, 01:44
by rcrsn51
The files I have are from another scanner.
In that case, you must run tesseract from the command line. If the source file is named "scan.tif", you will use the command

Code: Select all

tesseract scan.tif scan
This will produce the file "scan.txt"

Since you have tesseract 3.0, I believe that the input must be in TIFF format. If your files are something else, you will need to run them through a converter like mtpaint.

Posted: Fri 11 May 2012, 13:30
by Dingo
rcrsn51 wrote:Since you have tesseract 3.0, I believe that the input must be in TIFF format. If your files are something else, you will need to run them through a converter like mtpaint.
As far I remember (I compiled tesseract some times ago) leptonica lib gives to tesseract ability to load most common images format